summaryrefslogtreecommitdiff
path: root/docs/server
diff options
context:
space:
mode:
authorcperry-goog <[email protected]>2025-05-15 20:04:33 -0700
committerGitHub <[email protected]>2025-05-15 20:04:33 -0700
commit58ef39e2a964386a1026ba68419e4d64c4612551 (patch)
tree5c00113b2a92a33ee9bc4f0d4dc03782d3b342b2 /docs/server
parent3674fb0c7e230651f1f33c4d46b24ca003dd532a (diff)
Docs: Add initial project documentation structure and content (#368)
Co-authored-by: Taylor Mullen <[email protected]>
Diffstat (limited to 'docs/server')
-rw-r--r--docs/server/configuration.md99
-rw-r--r--docs/server/index.md52
-rw-r--r--docs/server/tools-api.md72
3 files changed, 223 insertions, 0 deletions
diff --git a/docs/server/configuration.md b/docs/server/configuration.md
new file mode 100644
index 00000000..c08c6ba4
--- /dev/null
+++ b/docs/server/configuration.md
@@ -0,0 +1,99 @@
+# Gemini CLI Server: Configuration
+
+Configuration for the Gemini CLI server component (`packages/server`) is critical for its operation, dictating how it connects to the Gemini API, which model it uses, how tools are executed, and more. Many of these settings are shared with or derived from the main CLI configuration when the CLI initializes the server backend.
+
+## Primary Configuration Sources
+
+The server's configuration is primarily established when the `Config` object (from `packages/server/src/config/config.ts`) is instantiated. The values come from a combination of:
+
+1. **Hardcoded Defaults:** Fallback values defined within the server and CLI packages.
+2. **Settings Files (`settings.json` via CLI):** Persistent settings that the CLI reads (User settings `~/.gemini/settings.json`, then Workspace settings `.gemini/settings.json`) and then passes relevant parts to the server configuration.
+3. **Environment Variables (potentially from `.env` files):** System-wide or session-specific variables. The CLI loads `.env` files (checking current directory, then ancestors, then `~/.env`) and these variables influence the server config.
+4. **Command-Line Arguments (passed from CLI):** Settings chosen by the user at launch time, which have the highest precedence for many options.
+
+## Key Configuration Parameters for the Server
+
+These are the main pieces of information the server `Config` object holds and uses:
+
+- **`apiKey` (string):**
+
+ - **Source:** Primarily `process.env.GEMINI_API_KEY` (loaded from the environment or `.env` files).
+ - **Importance:** Absolutely essential. The server cannot communicate with the Gemini API without it.
+
+- **`model` (string):**
+
+ - **Source:** Command-line argument (`--model`), environment variable (`GEMINI_CODE_MODEL`), or the default value `gemini-2.5-pro-preview-05-06`.
+ - **Purpose:** Specifies which Gemini model the server should use for generating responses.
+
+- **`sandbox` (boolean | string):**
+
+ - **Source:** Command-line argument (`--sandbox`), environment variable (`GEMINI_CODE_SANDBOX`), or `settings.json` (`sandbox` key).
+ - **Purpose:** Determines if and how tools (especially `execute_bash_command`) are sandboxed. This is crucial for security.
+ - `true`: Use a default sandboxing method.
+ - `false`: No sandboxing (less secure).
+ - `"docker"`, `"podman"`, or a custom command string: Specific sandboxing method.
+
+- **`targetDir` (string):**
+
+ - **Source:** Typically `process.cwd()` (the current working directory from which the CLI was launched).
+ - **Purpose:** Provides a base directory context for tools that operate on the file system (e.g., `read_file`, `list_directory`). Paths used in tool calls are often resolved relative to this directory.
+
+- **`debugMode` (boolean):**
+
+ - **Source:** Command-line argument (`--debug_mode`) or environment variables (e.g., `DEBUG=true`, `DEBUG_MODE=true`).
+ - **Purpose:** Enables verbose logging within the server and its tools, which is helpful for development and troubleshooting.
+
+- **`question` (string | undefined):**
+
+ - **Source:** Command-line argument (`--question`), usually when input is piped to the CLI.
+ - **Purpose:** Allows a direct question to be passed to the server for processing without interactive input.
+
+- **`fullContext` (boolean):**
+
+ - **Source:** Command-line argument (`--all_files`).
+ - **Purpose:** If true, instructs relevant tools (like `read_many_files` when used implicitly by the model) to gather a broad context from the `targetDir`.
+
+- **`toolDiscoveryCommand` (string | undefined):**
+- **`toolCallCommand` (string | undefined):**
+- **`mcpServerCommand` (string | undefined):**
+
+ - **Source:** `settings.json` or environment variables.
+ - **Purpose:** Advanced settings for customizing how tools are discovered or how the server interacts with other potential components in a more complex setup.
+
+- **`userAgent` (string):**
+
+ - **Source:** Automatically generated by the CLI, often including CLI package name, version, and Node.js environment details.
+ - **Purpose:** Sent with API requests to help identify the client making requests to the Gemini API.
+
+- **`userMemory` (string):**
+
+ - **Source:** Loaded from the hierarchical `GEMINI.md` files by the CLI (Global, Project Root/Ancestors, Sub-directory) and passed to the server config.
+ - **Purpose:** Contains the combined instructional context provided to the Gemini model.
+ - **Mutability:** This can be updated if the memory is refreshed by the user (e.g., via the `/refreshmemory` command in the CLI).
+
+- **`geminiMdFileCount` (number):**
+ - **Source:** Count of all `GEMINI.md` files successfully loaded by the CLI.
+ - **Purpose:** Metadata about the loaded instructional context, visible in the CLI footer.
+
+## Environment File (`.env`) Loading
+
+The CLI configuration logic, which precedes server initialization, includes loading an `.env` file. The search order is:
+
+1. `.env` in the current working directory.
+2. `.env` in parent directories, up to the project root (containing `.git`) or home directory.
+3. `~/.env` (in the user's home directory).
+
+This file is a common place to store the `GEMINI_API_KEY` and other environment-specific settings like `GEMINI_CODE_MODEL` or `DEBUG` flags.
+
+```
+# Example .env file
+GEMINI_API_KEY="YOUR_ACTUAL_API_KEY_HERE"
+GEMINI_CODE_MODEL="gemini-1.5-flash-latest"
+# DEBUG=true
+```
+
+## Tool Registry Initialization
+
+Upon initialization, the server's `Config` object is also used to create and populate a `ToolRegistry`. This registry is then aware of the `targetDir` and `sandbox` settings, which are vital for the correct and secure operation of tools like `ReadFileTool`, `ShellTool`, etc. The `ToolRegistry` is responsible for making tool schemas available to the Gemini model and for executing tool calls.
+
+Proper server configuration, derived from these various sources, is essential for the Gemini CLI to function correctly, securely, and according to the user's intent.
diff --git a/docs/server/index.md b/docs/server/index.md
new file mode 100644
index 00000000..ae8334e6
--- /dev/null
+++ b/docs/server/index.md
@@ -0,0 +1,52 @@
+# Gemini CLI Server
+
+This section delves into the server component of the Gemini CLI (`packages/server`). The server acts as the backend engine, handling communication with the Gemini API, managing tools, and processing requests from the CLI client.
+
+## Role of the Server
+
+The server package is a crucial part of the Gemini CLI ecosystem. While the CLI (`packages/cli`) provides the user interface, the server is responsible for:
+
+- **API Interaction:** Securely communicating with the Google Gemini API, sending user prompts, and receiving model responses.
+- **Prompt Engineering:** Constructing effective prompts for the Gemini model, potentially incorporating conversation history, tool definitions, and instructional context from `GEMINI.md` files.
+- **Tool Management & Orchestration:**
+ - Registering available tools (e.g., file system tools, shell command execution).
+ - Interpreting tool use requests from the Gemini model.
+ - Executing the requested tools with the provided arguments.
+ - Returning tool execution results to the Gemini model for further processing.
+- **Session and State Management:** Keeping track of the conversation state, including history and any relevant context required for coherent interactions.
+- **Configuration:** Managing server-specific configurations, such as API key access, model selection, and tool settings.
+
+## Key Components and Functionality
+
+While the exact implementation details are within the `packages/server/src/` directory, key conceptual components include:
+
+- **API Client (`client.ts`):** A module responsible for making HTTP requests to the Gemini API, handling authentication, and parsing responses.
+- **Prompt Management (`prompts.ts`):** Logic for creating and formatting the prompts sent to the Gemini model. This includes integrating user queries, historical context, and tool specifications.
+- \*\*Tool Registry and Execution (`tool-registry.ts`, `tools.ts`, individual tool files like `read-file.ts`, `shell.ts`):
+ - A system for discovering, registering, and describing available tools to the Gemini model.
+ - Code for executing each tool safely and effectively, often involving interaction with the operating system or external services.
+- **Configuration (`config.ts`):** Handles loading and providing access to server-side configurations, including API keys, model choices, and potentially tool-specific settings.
+- **Turn Management (`turn.ts`):** Manages the flow of a single conversational turn, from receiving user input to generating a final response, potentially involving multiple tool calls.
+
+## Interaction with the CLI
+
+The CLI and Server typically communicate over a local interface (e.g., standard input/output, or a local network connection if designed for broader use, though the current structure suggests a tightly coupled Node.js application).
+
+1. The CLI captures user input and forwards it to the Server.
+2. The Server processes the input, interacts with the Gemini API and tools as needed.
+3. The Server sends responses (text, tool calls, errors) back to the CLI.
+4. The CLI formats and displays these responses to the user.
+
+## Security Considerations
+
+The server plays a vital role in security:
+
+- **API Key Management:** It handles the `GEMINI_API_KEY` and ensures it is used securely when communicating with the Gemini API.
+- **Tool Execution:** When tools interact with the local system (e.g., `execute_bash_command`), the server (and its underlying tool implementations) must do so with appropriate caution, often involving sandboxing mechanisms to prevent unintended side effects.
+
+## Navigating this Section
+
+- **[Server Configuration](./configuration.md):** Details on how to configure the server component, including environment variables and specific settings.
+- **[Server Tools API](./tools-api.md):** Information on how tools are defined, registered, and used by the server.
+
+Understanding the server's role and architecture is key to comprehending the full capabilities and operational flow of the Gemini CLI.
diff --git a/docs/server/tools-api.md b/docs/server/tools-api.md
new file mode 100644
index 00000000..a6624fe4
--- /dev/null
+++ b/docs/server/tools-api.md
@@ -0,0 +1,72 @@
+# Gemini CLI Server: Tools API
+
+The Gemini CLI server (`packages/server`) features a robust system for defining, registering, and executing tools. These tools extend the capabilities of the Gemini model, allowing it to interact with the local environment, fetch web content, and perform various actions beyond simple text generation.
+
+## Core Concepts
+
+- **Tool (`tools.ts`):** An interface and base class (`BaseTool`) that defines the contract for all tools. Each tool must have:
+
+ - `name`: A unique internal name (used in API calls to Gemini).
+ - `displayName`: A user-friendly name.
+ - `description`: A clear explanation of what the tool does, which is provided to the Gemini model.
+ - `parameterSchema`: A JSON schema defining the parameters the tool accepts. This is crucial for the Gemini model to understand how to call the tool correctly.
+ - `validateToolParams()`: A method to validate incoming parameters.
+ - `getDescription()`: A method to provide a human-readable description of what the tool will do with specific parameters before execution.
+ - `shouldConfirmExecute()`: A method to determine if user confirmation is required before execution (e.g., for potentially destructive operations).
+ - `execute()`: The core method that performs the tool's action and returns a `ToolResult`.
+
+- **`ToolResult` (`tools.ts`):** An interface defining the structure of a tool's execution outcome:
+
+ - `llmContent`: The factual string content to be included in the history sent back to the LLM for context.
+ - `returnDisplay`: A user-friendly string (often Markdown) or a special object (like `FileDiff`) for display in the CLI.
+
+- **Tool Registry (`tool-registry.ts`):** A class (`ToolRegistry`) responsible for:
+ - **Registering Tools:** Holding a collection of all available built-in tools (e.g., `ReadFileTool`, `ShellTool`).
+ - **Discovering Tools:** It can also discover tools dynamically:
+ - **Command-based Discovery:** If `toolDiscoveryCommand` is configured in settings, this command is executed. It's expected to output JSON describing custom tools, which are then registered as `DiscoveredTool` instances.
+ - **MCP-based Discovery:** If `mcpServerCommand` is configured, the registry can connect to a Model Context Protocol (MCP) server to list and register tools (`DiscoveredMCPTool`).
+ - **Providing Schemas:** Exposing the `FunctionDeclaration` schemas of all registered tools to the Gemini model, so it knows what tools are available and how to use them.
+ - **Retrieving Tools:** Allowing the server to get a specific tool by name for execution.
+
+## Built-in Tools
+
+The server comes with a suite of pre-defined tools, typically found in `packages/server/src/tools/`. These include:
+
+- **File System Tools:**
+ - `LSTool` (`ls.ts`): Lists directory contents.
+ - `ReadFileTool` (`read-file.ts`): Reads the content of a single file.
+ - `WriteFileTool` (`write-file.ts`): Writes content to a file.
+ - `GrepTool` (`grep.ts`): Searches for patterns in files.
+ - `GlobTool` (`glob.ts`): Finds files matching glob patterns.
+ - `EditTool` (`edit.ts`): Performs in-place modifications to files (often requiring confirmation).
+ - `ReadManyFilesTool` (`read-many-files.ts`): Reads and concatenates content from multiple files or glob patterns (used by the `@` command in CLI).
+- **Execution Tools:**
+ - `ShellTool` (`shell.ts`): Executes arbitrary shell commands (requires careful sandboxing and user confirmation).
+- **Web Tools:**
+ - `WebFetchTool` (`web-fetch.ts`): Fetches content from a URL.
+
+Each of these tools extends `BaseTool` and implements the required methods for its specific functionality.
+
+## Tool Execution Flow
+
+1. **Model Request:** The Gemini model, based on the user's prompt and the provided tool schemas, decides to use a tool and returns a `FunctionCall` part in its response, specifying the tool name and arguments.
+2. **Server Receives Request:** The server parses this `FunctionCall`.
+3. **Tool Retrieval:** It looks up the requested tool in the `ToolRegistry`.
+4. **Parameter Validation:** The tool's `validateToolParams()` method is called.
+5. **Confirmation (if needed):**
+ - The tool's `shouldConfirmExecute()` method is called.
+ - If it returns details for confirmation, the server communicates this back to the CLI, which prompts the user.
+ - The user's decision (e.g., proceed, cancel) is sent back to the server.
+6. **Execution:** If validated and confirmed (or if no confirmation is needed), the server calls the tool's `execute()` method with the provided arguments and an `AbortSignal` (for potential cancellation).
+7. **Result Processing:** The `ToolResult` from `execute()` is received by the server.
+8. **Response to Model:** The `llmContent` from the `ToolResult` is packaged as a `FunctionResponse` and sent back to the Gemini model so it can continue generating a user-facing response.
+9. **Display to User:** The `returnDisplay` from the `ToolResult` is sent to the CLI to show the user what the tool did.
+
+## Extending with Custom Tools
+
+While direct programmatic registration of new tools by users isn't explicitly detailed as a primary workflow in the provided files for typical end-users, the architecture supports extension through:
+
+- **Command-based Discovery:** Advanced users or project administrators can define a `toolDiscoveryCommand` in `settings.json`. This command, when run by the Gemini CLI server, should output a JSON array of `FunctionDeclaration` objects. The server will then make these available as `DiscoveredTool` instances. The corresponding `toolCallCommand` would then be responsible for actually executing these custom tools.
+- **MCP Server:** For more complex scenarios, an MCP server can be set up (configured via `mcpServerCommand`) to expose tools that the Gemini CLI server can then discover and use.
+
+This tool system provides a flexible and powerful way to augment the Gemini model's capabilities, making the Gemini CLI a versatile assistant for a wide range of tasks.