225 lines
11 KiB
Markdown
225 lines
11 KiB
Markdown
# esmole + monorepo design
|
|
|
|
Date: 2026-06-16
|
|
Status: approved, pending implementation plan
|
|
|
|
## Goal
|
|
|
|
Add Elasticsearch as a second MCP server (`esmole-mcp`) alongside the existing
|
|
`dbmole-mcp` (PostgreSQL/MySQL). Restructure the repo into an npm-workspaces
|
|
monorepo so both servers share infrastructure (connection store, manager cache,
|
|
SSH tunnel, MCP plumbing) without forcing Elasticsearch into the SQL-shaped
|
|
`Driver` interface.
|
|
|
|
The agent sees two distinct MCP servers (two entries in its MCP config, each its
|
|
own `bin`). They live in one repo and share a private `core` package.
|
|
|
|
### Why not reuse the SQL `Driver` for ES
|
|
|
|
`Driver` is relational (`query(sql)`, `listDatabases`, `describeTable` with
|
|
PK/FK/indexes). Elasticsearch is a document/search store. Mapping `_search`→query,
|
|
indices→tables, mappings→describe gives a crippled, dishonest contract. ES gets
|
|
its own thin `Backend` abstraction instead; only the generic plumbing is shared.
|
|
|
|
### Reference
|
|
|
|
`../homelab/es-mcp` (Python/FastMCP) is the tool-surface baseline: a generic REST
|
|
passthrough plus four helpers, all returning `{status, body}` and never raising
|
|
on 4xx/5xx. esmole keeps that tool surface but swaps the obvious differences:
|
|
stdio transport (not HTTP+bearer), multi-connection named connections + store +
|
|
SSH tunnel inherited from dbmole (not single-connection from env).
|
|
|
|
## Scope decisions
|
|
|
|
- **ES versions:** 7.x and 8.x. Passthrough core is version-agnostic; helpers work
|
|
on both. No ESQL (`_query` is 8.11+, absent in 7.x).
|
|
- **Use cases:** read/debug + full CRUD + cluster ops, all reachable through the
|
|
generic passthrough; helpers cover the common read paths.
|
|
- **Improvements over the reference (all in scope):** output truncation / token
|
|
budget, per-connection `readonly` guard, mapping flatten (field:type list),
|
|
search projection (`_source` filter) + aggs-only mode.
|
|
- **Restructure:** full workspaces immediately — move existing `src` into
|
|
`packages/dbmole-mcp`, extract `core`.
|
|
|
|
## Architecture (Approach A: generic core + injected schema)
|
|
|
|
`core` owns the hard, backend-agnostic machinery; each leaf package supplies a
|
|
thin backend factory, its own connection schema, and its own tool set.
|
|
|
|
### §1. Repo layout
|
|
|
|
```
|
|
dbmole-mcp/ # repo root, private, workspaces: ["packages/*"]
|
|
package.json # shared devDeps + scripts (lint/test/build all)
|
|
biome.json # shared
|
|
tsconfig.base.json # shared compiler options
|
|
packages/
|
|
core/ # @dbmole/core — private, NOT published
|
|
dbmole-mcp/ # public npm, bin: dbmole-mcp
|
|
esmole-mcp/ # public npm, bin: esmole-mcp
|
|
```
|
|
|
|
`core` is **not published**. It is bundled into each leaf package via tsup
|
|
(`noExternal: [/@dbmole\/core/]`) so the published packages are self-contained,
|
|
with no inter-package version coupling and no third publish. Two public npm names
|
|
(`dbmole-mcp` unchanged, `esmole-mcp` new); `core` exists only inside the repo.
|
|
|
|
### §2. core public surface (the generic seam)
|
|
|
|
The current `registry`/`store`/`sources` import `connectionConfigSchema` directly
|
|
and hardcode `dbmole:` log prefixes and `DBMOLE_STORE` / `DBMOLE_CONNECTIONS`
|
|
env-var names. Generalization = inject the schema and the
|
|
storePath/envVar/logPrefix as dependencies.
|
|
|
|
- `createRegistry({ storePath, configPath, env, schema, logPrefix, envVar })` —
|
|
schema injected; no direct import of any concrete schema.
|
|
- `createManager<TBackend extends { dispose(): Promise<void> }>(registry,
|
|
{ createBackend, createTunnel, resolvePort })` — generic over the backend; does
|
|
not know `Driver`. `resolvePort(config)` is injected because the manager itself
|
|
calls `defaultPort(config.type)` today (`manager.ts:67`); ES needs 9200, SQL
|
|
5432/3306.
|
|
- `baseConnectionShape` — a zod raw shape **without** `type`, **including** the
|
|
`ssh` field (`sshConfigSchema` moves to core; the tunnel is already SQL-free
|
|
except for the `SshConfig` type at `tunnel.ts:5`). Each package spreads it, adds
|
|
its own `type` enum and engine-specific fields, then calls `.strict()`.
|
|
- `openTunnel` / `Tunnel` — unchanged (pure TCP).
|
|
- `respond` — unchanged, already generic.
|
|
- `withManaged<TBackend, TConfig>(manager, name, fn, { isStaleError, formatError })`
|
|
— generic. It is **not** unchanged: today it imports SQL `DriverDisposedError`,
|
|
`ManagedConnection`, and `formatDbError(config.type, …)` (`managed.ts:25,34`).
|
|
Core exports a backend-neutral stale-error class; stale detection and error
|
|
formatting are injected by each package (or stale-retry moves into the manager
|
|
behind that neutral error).
|
|
- `registerConnectionTools(server, { manager, registry, fullSchema, patchSchema,
|
|
publicView, descriptions, ping, formatError })` — generic connection CRUD
|
|
(list / add / remove / update / test_connection). The current tools bake in SQL
|
|
patch fields, the SQL public view (`database`), SQL default-port rendering, SQL
|
|
descriptions, `serverVersion()`, and SQL error formatting
|
|
(`connections.ts:21,61,131`); all of these are package-owned and injected. Core
|
|
only orchestrates registry + manager calls. `ping(backend)` backs
|
|
`test_connection` (SQL `serverVersion()` vs ES `GET /`).
|
|
- format split: only truncation / token-budget helpers go to core (`clampLimit`,
|
|
`truncateRows`, `truncateJsonBudget`). SQL-shaped `normalizeCell` /
|
|
`formatDbError` (`format.ts:16,36`) stay in dbmole-mcp.
|
|
|
|
### §3. Manager generalization
|
|
|
|
The manager needs only `dispose()` from a backend. All the hard logic — cache,
|
|
rotation, dispose-race handling, tunnel guards, retry-on-stale (`manager.ts:82-129`)
|
|
— moves verbatim into core. Changes: `defaultCreateDriver` becomes the injected
|
|
`createBackend(target)`; the internal `defaultPort(config.type)` call
|
|
(`manager.ts:67`) becomes the injected `resolvePort(config)`. The
|
|
`tunnel?.isClosed()` recheck stays.
|
|
|
|
`DriverTarget` → `BackendTarget { config, connectHost, connectPort, serverName }`
|
|
(generic config type parameter). `connectHost`/`connectPort` are where the client
|
|
actually dials — the tunnel's `127.0.0.1:localPort` when tunneled, else the real
|
|
host/port. `serverName` is the original `config.host`, carried through for TLS SNI
|
|
/ certificate hostname verification. Without this split, HTTPS Elasticsearch over
|
|
an SSH tunnel fails `verifyTls`, because the cert covers the real host, not
|
|
`127.0.0.1` (`tunnel.ts:170`). SQL drivers ignore `serverName`; the ES client sets
|
|
it as the TLS servername.
|
|
|
|
The SQL `Driver` interface stays in `dbmole-mcp`. ES implements its own `Backend`.
|
|
Both satisfy `{ dispose(): Promise<void> }`, so both ride the same manager — ES
|
|
inherits multi-connection, named connections, SSH tunnel, runtime `add_connection`,
|
|
and the store for free (an upgrade over the single-connection reference).
|
|
|
|
### §4. Connection schema split
|
|
|
|
- **base (core):** `name`, `host`, `port?`, `user` (required), `password?`,
|
|
`readonly`, `ssh` — exactly dbmole's current fields minus `database`, so dbmole's
|
|
behavior is unchanged. (`database` leaves the base — it is SQL-specific.)
|
|
- **dbmole:** base + `type: enum(['postgres','mysql'])` + `database?`;
|
|
`defaultPort` 5432 / 3306. No override of base fields.
|
|
- **esmole:** base with `user` overridden to optional, +
|
|
`type: enum(['elasticsearch'])` +
|
|
`scheme: enum(['http','https']).default('https')` +
|
|
`verifyTls: boolean.default(true)` + `apiKey?` (sent as `Authorization: ApiKey
|
|
<value>`), plus a `.refine` requiring user/password **or** apiKey;
|
|
`defaultPort` 9200.
|
|
- `registry.update`'s engine-switch port-drop (`registry.ts:130`) is already
|
|
generic (any `type` change without an explicit `port` drops the old port).
|
|
|
|
### §5. esmole backend + tools
|
|
|
|
**Backend** = an HTTP client (undici, keep-alive + `dispose()`) dialing
|
|
`scheme://connectHost:connectPort` (the tunnel endpoint when tunneled), with auth
|
|
(basic or apiKey) and `verifyTls`. When `verifyTls` is on and the connection is
|
|
tunneled, the client sets the TLS servername to `BackendTarget.serverName` (the
|
|
real ES host) so certificate hostname verification passes (see §3).
|
|
`request(method, path, { body, params })` → `{status, body}`, never throwing on
|
|
4xx/5xx; body parsed as JSON when possible, else text. A `string` body is sent
|
|
as-is (for NDJSON `_bulk`); dict/list is JSON-serialized.
|
|
|
|
**readonly guard** (`es/guard.ts`, role analogous to `sqlGuard`): a method+path
|
|
boundary. When the connection is `readonly`, allow GET/HEAD plus POST to a
|
|
read-suffix allowlist (`_search`, `_msearch`, `_count`, `_field_caps`, `_cat`,
|
|
`_mapping`, `_search/scroll`, `_pit`) plus DELETE limited to `_pit` and
|
|
`_search/scroll` (point-in-time / scroll cleanup — read-session teardown, not data
|
|
mutation). Block all other PUT/DELETE and any other POST. `_sql` is blocked by
|
|
absence from the allowlist (it can write). Allowlist (not blocklist) so unknown
|
|
endpoints fail safe. Script content inside a `_search` body is content-level, not
|
|
method-level, and is out of scope for this guard.
|
|
|
|
**Tools (5 ES-specific + connection CRUD from core):**
|
|
|
|
| tool | wraps | improvement |
|
|
|---|---|---|
|
|
| `es_request` | generic passthrough | readonly guard + truncation |
|
|
| `es_search` | `POST /{index}/_search` | `_source` projection + aggs-only (size:0) + truncation |
|
|
| `es_list_indices` | `GET /_cat/indices` | — |
|
|
| `es_get_mapping` | `GET /{index}/_mapping` | flatten to field:type list (default); `raw?` for nested JSON |
|
|
| `es_cluster_health` | `GET /_cluster/health` | — |
|
|
|
|
Index is always explicit; there is no default index. `es_request` is the primary
|
|
tool and covers the entire ES REST surface; helpers are sugar for common reads.
|
|
|
|
**truncation:** cap response by byte budget and hit count, set `truncated: true`,
|
|
mirroring dbmole's row truncation.
|
|
|
|
### §6. Distribution / entry / docker
|
|
|
|
- Each leaf package has its own stdio entry and `bin`. `esmole-mcp` →
|
|
`dist/index.js`.
|
|
- **Docker:** per-package Dockerfile, self-contained (core is bundled in). A root
|
|
multi-image build is optional/later.
|
|
- **npm:** `dbmole-mcp` (unchanged), `esmole-mcp` (new), both public; `core`
|
|
private.
|
|
|
|
### §7. Testing
|
|
|
|
- Per-package vitest projects: unit (mocked IO) + integration (testcontainers).
|
|
esmole integration runs against ES 7.x **and** 8.x containers. Coverage ≥90%
|
|
lines/functions per package — thresholds never lowered.
|
|
- Manager concurrency tests move into `core` alongside the manager.
|
|
- Remap / alias test import paths **before** moving files, not after. Integration
|
|
tests reference the old `src/...` paths; if files move first, the ≥90% gate
|
|
breaks mid-migration.
|
|
|
|
## Defaults
|
|
|
|
- ES `readonly` default `false` (matches dbmole).
|
|
- Auth: user/password primary, `apiKey` optional.
|
|
- `scheme` default `https` (8.x-friendly); 7.x-over-http sets `http` explicitly.
|
|
|
|
## Migration order (high level; detailed plan via writing-plans)
|
|
|
|
1. Workspaces skeleton; `git mv src` into `packages/dbmole-mcp` (history
|
|
preserved); tests green.
|
|
2. Extract `core` (config/manager/tunnel/respond/format/connection-tools), inject
|
|
the schema; dbmole depends on core; tests green.
|
|
3. Scaffold esmole: schema → backend → guard → tools → entry, TDD.
|
|
4. Docker + publish config.
|
|
|
|
## Out of scope
|
|
|
|
- ESQL (`_query`) helper — 8.x-only, deferred.
|
|
- OpenSearch-specific testing — passthrough likely works, but helpers are not
|
|
validated against it.
|
|
- Publishing `core` as a standalone package.
|
|
- Cross-server unified config (each server keeps its own store namespace:
|
|
`ESMOLE_STORE` / `ESMOLE_CONNECTIONS` vs `DBMOLE_STORE` / `DBMOLE_CONNECTIONS`).
|
|
</content>
|
|
</invoke>
|