diff options
| author | Tommaso Sciortino <[email protected]> | 2025-05-30 18:25:47 -0700 |
|---|---|---|
| committer | GitHub <[email protected]> | 2025-05-30 18:25:47 -0700 |
| commit | 21fba832d1b4ea7af43fb887d9b2b38fcf8210d0 (patch) | |
| tree | 7200d2fac3a55c385e0a2dac34b5282c942364bc /docs/core | |
| parent | c81148a0cc8489f657901c2cc7247c0834075e1a (diff) | |
Rename server->core (#638)
Diffstat (limited to 'docs/core')
| -rw-r--r-- | docs/core/configuration.md | 110 | ||||
| -rw-r--r-- | docs/core/index.md | 52 | ||||
| -rw-r--r-- | docs/core/tools-api.md | 73 |
3 files changed, 235 insertions, 0 deletions
diff --git a/docs/core/configuration.md b/docs/core/configuration.md new file mode 100644 index 00000000..4ee47350 --- /dev/null +++ b/docs/core/configuration.md @@ -0,0 +1,110 @@ +# Gemini CLI Core: Configuration + +Configuration for the Gemini CLI core component (`packages/core`) is critical for its operation, dictating how it connects to the Gemini API, which model it uses, how tools are executed, and more. Many of these settings are shared with or derived from the main CLI configuration when the CLI initializes the core backend. + +## Primary Configuration Sources + +The core's configuration is primarily established when the `Config` object (from `packages/core/src/config/config.ts`) is instantiated. The values come from a combination of: + +1. **Hardcoded Defaults:** Fallback values defined within the core and CLI packages. +2. **Settings Files (`settings.json` via CLI):** Persistent settings that the CLI reads (User settings `~/.gemini/settings.json`, then Workspace settings `.gemini/settings.json`) and then passes relevant parts to the core configuration. +3. **Environment Variables (potentially from `.env` files):** System-wide or session-specific variables. The CLI loads `.env` files (checking current directory, then ancestors, then `~/.env`) and these variables influence the core config. +4. **Command-Line Arguments (passed from CLI):** Settings chosen by the user at launch time, which have the highest precedence for many options. + +## Key Configuration Parameters for the Core + +These are the main pieces of information the core `Config` object holds and uses: + +- **`apiKey` (string):** + + - **Source:** Primarily `process.env.GEMINI_API_KEY` (loaded from the environment or `.env` files). + - **Importance:** Absolutely essential for connecting to the Gemini API. (If using Vertex AI, authentication is handled differently, typically via Application Default Credentials - see README.md). + +- **`model` (string):** + + - **Source:** Command-line argument (`--model`), environment variable (`GEMINI_MODEL`), or a default value (e.g., `gemini-2.5-pro-preview-05-06`). + - **Purpose:** Specifies which Gemini model the core should use. (For Vertex AI model names and usage, refer to the main README.md). + +- **`sandbox` (boolean | string):** + + - **Source:** Command-line argument (`--sandbox`), environment variable (`GEMINI_SANDBOX`), or `settings.json` (`sandbox` key). + - **Purpose:** Determines if and how tools (especially `execute_bash_command`) are sandboxed. This is crucial for security. + - `true`: Use a default sandboxing method. + - `false`: No sandboxing (less secure). + - `"docker"`, `"podman"`, or a custom command string: Specific sandboxing method. + +- **`targetDir` (string):** + + - **Source:** Typically `process.cwd()` (the current working directory from which the CLI was launched). + - **Purpose:** Provides a base directory context for tools that operate on the file system (e.g., `read_file`, `list_directory`). Paths used in tool calls are often resolved relative to this directory. + +- **`debugMode` (boolean):** + + - **Source:** Command-line argument (`--debug_mode`) or environment variables (e.g., `DEBUG=true`, `DEBUG_MODE=true`). + - **Purpose:** Enables verbose logging within the core and its tools, which is helpful for development and troubleshooting. + +- **`question` (string | undefined):** + + - **Source:** Command-line argument (`--question`), usually when input is piped to the CLI. + - **Purpose:** Allows a direct question to be passed to the core for processing without interactive input. + +- **`fullContext` (boolean):** + + - **Source:** Command-line argument (`--all_files`). + - **Purpose:** If true, instructs relevant tools (like `read_many_files` when used implicitly by the model) to gather a broad context from the `targetDir`. + +- **`toolDiscoveryCommand` (string | undefined):** + +- `toolCallCommand` (string | undefined): +- `mcpServers` (object | undefined): + - **Source:** `settings.json` (`mcpServers` key), passed from the CLI. + - **Purpose:** Advanced setting for configuring connections to one or more Model-Context Protocol (MCP) servers. This allows the Gemini CLI to discover and utilize tools exposed by these external servers. + - **Structure:** An object where each key is a unique server name (alias) and the value is an object containing: + - `command` (string, required): The command to execute to start the MCP server. + - `args` (array of strings, optional): Arguments for the command. + - `env` (object, optional): Environment variables for the server process. + - `cwd` (string, optional): Working directory for the server. + - `timeout` (number, optional): Request timeout in milliseconds. + - **Behavior:** The core will attempt to connect to each configured MCP server. Tool names from these servers might be prefixed with the server alias to prevent naming collisions. The core may also adapt tool schemas from MCP servers for internal compatibility. +- `mcpServerCommand` (string | undefined, **deprecated**): + + - **Source:** `settings.json` (`mcpServerCommand` key). + - **Purpose:** Legacy setting for a single MCP server. Superseded by `mcpServers`. + +- **`userAgent` (string):** + + - **Source:** Automatically generated by the CLI, often including CLI package name, version, and Node.js environment details. + - **Purpose:** Sent with API requests to help identify the client making requests to the Gemini API. + +- **`userMemory` (string):** + + - **Source:** Loaded from the hierarchical `GEMINI.md` files by the CLI (Global, Project Root/Ancestors, Sub-directory) and passed to the core config. + - **Purpose:** Contains the combined instructional context provided to the Gemini model. + - **Mutability:** This can be updated if the memory is refreshed by the user (e.g., via the `/memory refresh` command in the CLI). + +- **`geminiMdFileCount` (number):** + - **Source:** Count of all `GEMINI.md` files successfully loaded by the CLI. + - **Purpose:** Metadata about the loaded instructional context, visible in the CLI footer. + +## Environment File (`.env`) Loading + +The CLI configuration logic, which precedes core initialization, includes loading an `.env` file. The search order is: + +1. `.env` in the current working directory. +2. `.env` in parent directories, up to the project root (containing `.git`) or home directory. +3. `~/.env` (in the user's home directory). + +This file is a common place to store the `GEMINI_API_KEY` and other environment-specific settings like `GEMINI_MODEL` or `DEBUG` flags. + +``` +# Example .env file +GEMINI_API_KEY="YOUR_ACTUAL_API_KEY_HERE" +GEMINI_MODEL="gemini-1.5-flash-latest" +# DEBUG=true +``` + +## Tool Registry Initialization + +Upon initialization, the core's `Config` object is also used to create and populate a `ToolRegistry`. This registry is then aware of the `targetDir` and `sandbox` settings, which are vital for the correct and secure operation of tools like `ReadFileTool`, `ShellTool`, etc. The `ToolRegistry` is responsible for making tool schemas available to the Gemini model and for executing tool calls. + +Proper core configuration, derived from these various sources, is essential for the Gemini CLI to function correctly, securely, and according to the user's intent. diff --git a/docs/core/index.md b/docs/core/index.md new file mode 100644 index 00000000..0a3b9d9e --- /dev/null +++ b/docs/core/index.md @@ -0,0 +1,52 @@ +# Gemini CLI Core + +This section delves into the core component of the Gemini CLI (`packages/core`). The core acts as the backend engine, handling communication with the Gemini API, managing tools, and processing requests from the CLI client. + +## Role of the Core + +The core package is a crucial part of the Gemini CLI ecosystem. While the CLI (`packages/cli`) provides the user interface, the core is responsible for: + +- **API Interaction:** Securely communicating with the Google Gemini API, sending user prompts, and receiving model responses. +- **Prompt Engineering:** Constructing effective prompts for the Gemini model, potentially incorporating conversation history, tool definitions, and instructional context from `GEMINI.md` files. +- **Tool Management & Orchestration:** + - Registering available tools (e.g., file system tools, shell command execution). + - Interpreting tool use requests from the Gemini model. + - Executing the requested tools with the provided arguments. + - Returning tool execution results to the Gemini model for further processing. +- **Session and State Management:** Keeping track of the conversation state, including history and any relevant context required for coherent interactions. +- **Configuration:** Managing core-specific configurations, such as API key access, model selection, and tool settings. + +## Key Components and Functionality + +While the exact implementation details are within the `packages/core/src/` directory, key conceptual components include: + +- **API Client** (`client.ts`): A module responsible for making HTTP requests to the Gemini API, handling authentication, and parsing responses. +- **Prompt Management** (`prompts.ts`): Logic for creating and formatting the prompts sent to the Gemini model. This includes integrating user queries, historical context, and tool specifications. +- **Tool Registry and Execution** (`tool-registry.ts`, `tools.ts`, individual tool files like `read-file.ts`, `shell.ts`): + - A system for discovering, registering, and describing available tools to the Gemini model. + - Code for executing each tool safely and effectively, often involving interaction with the operating system or external services. +- **Configuration (`config.ts`):** Handles loading and providing access to core-side configurations, including API keys, model choices, and potentially tool-specific settings. +- **Turn Management (`turn.ts`):** Manages the flow of a single conversational turn, from receiving user input to generating a final response, potentially involving multiple tool calls. + +## Interaction with the CLI + +The CLI and Core typically communicate over a local interface (e.g., standard input/output, or a local network connection if designed for broader use, though the current structure suggests a tightly coupled Node.js application). + +1. The CLI captures user input and forwards it to the Core. +2. The Core processes the input, interacts with the Gemini API and tools as needed. +3. The Core sends responses (text, tool calls, errors) back to the CLI. +4. The CLI formats and displays these responses to the user. + +## Security Considerations + +The core plays a vital role in security: + +- **API Key Management:** It handles the `GEMINI_API_KEY` and ensures it is used securely when communicating with the Gemini API. +- **Tool Execution:** When tools interact with the local system (e.g., `execute_bash_command`), the core (and its underlying tool implementations) must do so with appropriate caution, often involving sandboxing mechanisms to prevent unintended side effects. + +## Navigating this Section + +- **[Core Configuration](./configuration.md):** Details on how to configure the core component, including environment variables and specific settings. +- **[Core Tools API](./tools-api.md):** Information on how tools are defined, registered, and used by the core. + +Understanding the core's role and architecture is key to comprehending the full capabilities and operational flow of the Gemini CLI. diff --git a/docs/core/tools-api.md b/docs/core/tools-api.md new file mode 100644 index 00000000..1ecc76e2 --- /dev/null +++ b/docs/core/tools-api.md @@ -0,0 +1,73 @@ +# Gemini CLI Core: Tools API + +The Gemini CLI core (`packages/core`) features a robust system for defining, registering, and executing tools. These tools extend the capabilities of the Gemini model, allowing it to interact with the local environment, fetch web content, and perform various actions beyond simple text generation. + +## Core Concepts + +- **Tool (`tools.ts`):** An interface and base class (`BaseTool`) that defines the contract for all tools. Each tool must have: + + - `name`: A unique internal name (used in API calls to Gemini). + - `displayName`: A user-friendly name. + - `description`: A clear explanation of what the tool does, which is provided to the Gemini model. + - `parameterSchema`: A JSON schema defining the parameters the tool accepts. This is crucial for the Gemini model to understand how to call the tool correctly. + - `validateToolParams()`: A method to validate incoming parameters. + - `getDescription()`: A method to provide a human-readable description of what the tool will do with specific parameters before execution. + - `shouldConfirmExecute()`: A method to determine if user confirmation is required before execution (e.g., for potentially destructive operations). + - `execute()`: The core method that performs the tool's action and returns a `ToolResult`. + +- **`ToolResult` (`tools.ts`):** An interface defining the structure of a tool's execution outcome: + + - `llmContent`: The factual string content to be included in the history sent back to the LLM for context. + - `returnDisplay`: A user-friendly string (often Markdown) or a special object (like `FileDiff`) for display in the CLI. + +- **Tool Registry (`tool-registry.ts`):** A class (`ToolRegistry`) responsible for: + - **Registering Tools:** Holding a collection of all available built-in tools (e.g., `ReadFileTool`, `ShellTool`). + - **Discovering Tools:** It can also discover tools dynamically: + - **Command-based Discovery:** If `toolDiscoveryCommand` is configured in settings, this command is executed. It's expected to output JSON describing custom tools, which are then registered as `DiscoveredTool` instances. + - **MCP-based Discovery:** If `mcpServerCommand` is configured, the registry can connect to a Model Context Protocol (MCP) server to list and register tools (`DiscoveredMCPTool`). + - **Providing Schemas:** Exposing the `FunctionDeclaration` schemas of all registered tools to the Gemini model, so it knows what tools are available and how to use them. + - **Retrieving Tools:** Allowing the core to get a specific tool by name for execution. + +## Built-in Tools + +The core comes with a suite of pre-defined tools, typically found in `packages/core/src/tools/`. These include: + +- **File System Tools:** + - `LSTool` (`ls.ts`): Lists directory contents. + - `ReadFileTool` (`read-file.ts`): Reads the content of a single file. + - `WriteFileTool` (`write-file.ts`): Writes content to a file. + - `GrepTool` (`grep.ts`): Searches for patterns in files. + - `GlobTool` (`glob.ts`): Finds files matching glob patterns. + - `EditTool` (`edit.ts`): Performs in-place modifications to files (often requiring confirmation). + - `ReadManyFilesTool` (`read-many-files.ts`): Reads and concatenates content from multiple files or glob patterns (used by the `@` command in CLI). +- **Execution Tools:** + - `ShellTool` (`shell.ts`): Executes arbitrary shell commands (requires careful sandboxing and user confirmation). +- **Web Tools:** + - `WebFetchTool` (`web-fetch.ts`): Fetches content from a URL. + +Each of these tools extends `BaseTool` and implements the required methods for its specific functionality. + +## Tool Execution Flow + +1. **Model Request:** The Gemini model, based on the user's prompt and the provided tool schemas, decides to use a tool and returns a `FunctionCall` part in its response, specifying the tool name and arguments. +2. **Core Receives Request:** The core parses this `FunctionCall`. +3. **Tool Retrieval:** It looks up the requested tool in the `ToolRegistry`. +4. **Parameter Validation:** The tool's `validateToolParams()` method is called. +5. **Confirmation (if needed):** + - The tool's `shouldConfirmExecute()` method is called. + - If it returns details for confirmation, the core communicates this back to the CLI, which prompts the user. + - The user's decision (e.g., proceed, cancel) is sent back to the core. +6. **Execution:** If validated and confirmed (or if no confirmation is needed), the core calls the tool's `execute()` method with the provided arguments and an `AbortSignal` (for potential cancellation). +7. **Result Processing:** The `ToolResult` from `execute()` is received by the core. +8. **Response to Model:** The `llmContent` from the `ToolResult` is packaged as a `FunctionResponse` and sent back to the Gemini model so it can continue generating a user-facing response. +9. **Display to User:** The `returnDisplay` from the `ToolResult` is sent to the CLI to show the user what the tool did. + +## Extending with Custom Tools + +While direct programmatic registration of new tools by users isn't explicitly detailed as a primary workflow in the provided files for typical end-users, the architecture supports extension through: + +- **Command-based Discovery:** Advanced users or project administrators can define a `toolDiscoveryCommand` in `settings.json`. This command, when run by the Gemini CLI core, should output a JSON array of `FunctionDeclaration` objects. The core will then make these available as `DiscoveredTool` instances. The corresponding `toolCallCommand` would then be responsible for actually executing these custom tools. + \ +- **MCP Server(s):** For more complex scenarios, one or more MCP servers can be set up and configured via the `mcpServers` setting in `settings.json`. The Gemini CLI core can then discover and use tools exposed by these servers. As mentioned, if you have multiple MCP servers, the tool names will be prefixed with the server name from your configuration (e.g., `serverAlias__actualToolName`). + +This tool system provides a flexible and powerful way to augment the Gemini model\'s capabilities, making the Gemini CLI a versatile assistant for a wide range of tasks. |
