--- url: 'https://symtether.dev/guide.md' --- # Guide > **What symtether is.** A one-page open spec for `#sym:`, a portable > markdown link fragment that points at a named symbol in source > code, e.g., `[fetchData](src/client.ts#sym:ApiClient.fetchData)`. > It also ships the reference toolkit that enforces the spec. The CLI > verifies every ref against the code at three tiers. Tier one is AST > resolution via tree-sitter for 18 languages, tier two is lexical > search for everything else, and tier three is file-only when the > fragment cannot be checked. It runs on any repo with `npx symtether > check`, needs no config, no repo indexing, and no native compile, > and fails CI when a ref is broken. symtether validates `#sym:` references in markdown. These are links that point at a specific function, class, method, type, or constant in a source file. When one breaks, symtether fails CI. ## Install and first run You do not need to install symtether. Run it with npx: ```console npx symtether check ``` Exit codes: * `0`. All refs pass. * `1`. Broken refs, stale refs under `--strict`, or an outdated sum file under `update --check`. * `2`. Usage or runtime error. Default scope is every `**/*.md` in the repo. Exclusions come from your `.gitignore`, plus `node_modules`, which is always skipped ([GLOB\_OPTIONS](/src/check.ts#sym:const:GLOB_OPTIONS)). ## Commands ```console npx symtether check [globs…] # validate refs; exit 1 on broken npx symtether check --json # stable machine output npx symtether fix [globs…] # propose repairs (dry-run) npx symtether fix --write # apply them npx symtether fix --canonicalize # also rewrite compat-form refs to #sym: npx symtether init # install the agent block into AGENTS.md npx symtether init --ci # + a GitHub Actions workflow npx symtether update [targets…] # stamp review: (re)generate symtether.sum npx symtether update --check # CI: fail if symtether.sum is out of date npx symtether check --strict # also fail when stamped targets changed npx symtether check --strict=warn # …or just report staleness ``` The CLI calls the same functions the library exports: ```ts import { check } from 'symtether'; const report = await check({ cwd: '/path/to/repo' }); ``` ## Resolution tiers Every ref resolves at one of three tiers, and the tier is part of the output. Anything that could not be fully verified shows up as `lexical` or `file-only` rather than passing quietly ([Resolver](/src/resolve.ts#sym:class:Resolver)): | Tier | When | Meaning | |---|---|---| | `ast` | TypeScript, TSX, JavaScript, Python, Go, Rust, Java, Kotlin, Swift, Ruby, PHP, C, C++, C#, Scala, Elixir, Lua, Bash | Symbol verified against the parsed AST | | `lexical` | any other text file | Word-boundary match for the symbol name | | `file-only` | fragment not checkable | Path existence only, reported as a warning | Adding a tier-1 language is mostly a grammar import plus fixtures ([loadLanguage](/src/languages/index.ts#sym:fn:loadLanguage)). See [Adding a language](./adding-a-language.md) for the walkthrough. Open an issue if yours is missing. The prerequisite is a WASM build of the grammar. Most grammars ship prebuilt on npm. Swift's does not, so we compile and vendor it ourselves. Dart has no usable WASM build at all, so it resolves at tier 2. Renames and deletions still get caught there, without awareness of nesting. ### Kind mapping The optional `` disambiguator (`#sym:fn:parse`) filters matches by what the definition is. The four kinds are deliberately coarse, because they exist to break ties rather than to classify. Each kind accepts these definition kinds from the underlying grammars ([KIND\_MAP](/src/languages/index.ts#sym:const:KIND_MAP)): | `` | Accepts | Examples | |---|---|---| | `fn` | function, method, macro | a Go func, a Python method, a Rust `macro_rules!` | | `class` | class, struct, object | a TS class, a C struct, a Kotlin object, a C# record | | `type` | interface, type, enum, module, class, struct, object | a TS interface, a Rust enum, a Go type, a C++ namespace | | `const` | constant, field, property, variable | a Go const, a Java field, a Scala val, a Python class attribute | The overlaps are intentional. `class` and `type` both accept classes and structs, since a class is a type. Languages also disagree about what counts as a "constant" versus a "field", so `const` accepts both rather than making authors guess which capture kind a grammar emits. If a kind filter eliminates every match, the error names the kinds that do exist: ``` ✗ src/server.go#sym:class:NewServer BROKEN (line 3) file OK; "NewServer" exists but is not a class (found: function) ``` ## Teaching your agents ```console npx symtether init ``` installs a short managed block into `AGENTS.md`. Re-running it updates the block in place, and it does not duplicate the block or touch anything outside the markers. The block tells agents to resolve refs by grepping, to run `check` and `fix` after renaming symbols, and to prefer `#sym:` refs over line numbers when writing docs. To catch what agents miss, add the CI workflow. ```console npx symtether init --ci ``` ## Staleness detection By default `check` fails only on broken refs. If you also want to find out when the implementation behind a ref changes, use the sum file. The flow is: 1. `npx symtether update` writes `symtether.sum`, which holds a normalized content hash ([hashDefinition](/src/checksum.ts#sym:fn:hashDefinition)) for every resolvable ref. Reformatting does not change a hash. Renaming does not either, because the hash excludes the symbol's own name. That is what lets `fix` detect renames by content. 2. `npx symtether check --strict` marks refs stale when their target's hash no longer matches, and lists every doc referencing the changed target. `--strict=warn` reports without failing. 3. Re-read the prose, fix it or confirm it, then re-stamp with `npx symtether update `. The sum file is optional. If a repo never runs `update`, the sum file is never written, and `check` still runs against the markdown links. When the sum file does exist, it holds derived checksums, not decisions. `go.sum` uses the same idea. If you delete the sum file, `check` passes or fails exactly as before, and the next `update` writes the sum file back. ## Limits * symtether guarantees the pointer resolves. It does not guarantee the prose around the pointer is still true. `--strict` flags refs whose implementation changed, but you or your agents judge whether the prose still holds. * Resolution checks that a definition exists in the linked file. There is no import following or re-export chasing, so a symbol re-exported but not defined in the linked file counts as broken. Link to the defining file instead. --- --- url: 'https://symtether.dev/adding-a-language.md' --- # Adding a language > **How to widen `#sym:` AST coverage to a new language.** The spec > is language-agnostic. The reference toolkit resolves refs at tier > one (AST) whenever it has a tree-sitter grammar for the file. > Adding a tier-one language takes four steps: > > 1. Add the tree-sitter grammar as a devDependency. > 2. Register it in `scripts/copy-grammars.mjs` so its WASM and tags > query get copied at build time. > 3. Register the file extensions in > [`SPECS`](/src/languages/index.ts#sym:const:SPECS). > 4. Add fixtures. > > The resolver has no per-language logic, so languages are data. The > grammar must ship a prebuilt WASM on npm, and symtether never > compiles grammars at install time. symtether resolves refs at three tiers. A file whose grammar is not bundled falls back to tier 2, and tier 2 still catches renames and deletions. Adding a tier-1 grammar takes four steps. The grammar registry lives in [loadLanguage](/src/languages/index.ts#sym:fn:loadLanguage). Every language is data in the same file, and the resolver has no per-language logic. The steps to reach tier 1 are: 1. add a dev-dependency for the grammar, 2. teach [`scripts/copy-grammars.mjs`](/scripts/copy-grammars.mjs#sym:const:grammars) how to copy the WASM and the tags query, 3. register the extension in the [`SPECS`](/src/languages/index.ts#sym:const:SPECS) table, 4. add test fixtures. ## Prerequisites The grammar package must ship a prebuilt WASM. Most tree-sitter grammars on npm do. If yours does not, the tool would have to compile the grammar at install time, and this project does not do that (see the design laws in AGENTS.md). You have two options. * **Vendor the WASM under `vendor/grammars/`** and register it in the [`vendored`](/scripts/copy-grammars.mjs#sym:const:vendored) list in `scripts/copy-grammars.mjs`. Swift takes this route because upstream publishes no WASM. The vendor script builds the grammar in Docker and commits the artifact. * **Skip the language.** Tier 2 already catches most breakage in docs. ## 1. Add the grammar as a dev-dependency Grammars go in `devDependencies`, never in `dependencies`. Their `install` scripts run `node-gyp` to build native bindings that this project never uses, and running that step on Cloudflare Workers Builds breaks the build. The `.npmrc` sets `ignore-scripts=true` so the native step is skipped. The published symtether package then ships only the WASM, which is why the grammar can be a dev-dependency. [copy-grammars.mjs](/scripts/copy-grammars.mjs#sym:const:grammars) extracts the WASM at build time and writes it into `grammars/`. ```sh npm install --save-dev --ignore-scripts tree-sitter- ``` ## 2. Copy the grammar at build time Edit [copy-grammars.mjs](/scripts/copy-grammars.mjs#sym:const:grammars) and add a row to the `grammars` array. The row has four fields, in this order: package name, WASM filename, output basename, extra query names. ```js ['tree-sitter-', 'tree-sitter-.wasm', '', ['']], ``` The output basename is what the runtime looks up. Keep it short and lowercase. Include an entry in the extra query names only if you need a supplemental `queries/.extra.scm` file. The upstream grammar's own `tags.scm` is used automatically. If the upstream grammar ships no `tags.scm` at all (Kotlin and Bash), then your `.extra.scm` becomes the whole query. ## 3. Register the extension Add one line to the [`SPECS`](/src/languages/index.ts#sym:const:SPECS) table inside [src/languages/index.ts](/src/languages/index.ts#sym:fn:loadLanguage). ```ts '.': { grammar: '', tags: [''] }, ``` The `tags` array is a chain. Each entry names a `.tags.scm` file in `grammars/`, and the resolver concatenates them in order at load time. The TypeScript entries chain `['javascript', 'typescript']` because the TS `tags.scm` is authored as a supplement to the JS one. Only chain like this when your language has an equivalent inheritance. ## 4. Cover the four `#sym:` kinds The `` disambiguator (`#sym:fn:foo`) filters by capture kind. The mapping table is [KIND\_MAP](/src/languages/index.ts#sym:const:KIND_MAP). | `` | Accepts | |---|---| | `fn` | function, method, macro | | `class` | class, struct, object | | `type` | interface, type, enum, module, class, struct, object | | `const` | constant, field, property, variable | Your `tags.scm` (or the supplemental `queries/.extra.scm`) has to emit `@definition.` captures that match one of these four kinds. If the upstream grammar only emits `@definition.function` for methods and you want to tell methods and functions apart, add a supplemental query. See [queries/kotlin.extra.scm](https://github.com/jutaz/symtether/blob/main/queries/kotlin.extra.scm) for a short example. ## 5. Add fixtures The tier-1 coverage test in [test/languages.test.ts](https://github.com/jutaz/symtether/blob/main/test/languages.test.ts) drives every bundled grammar through the same fixture layout. ``` test/fixtures/basic/ src/. # a small source file with a few definitions docs/languages.md # ref lines pointing at the definitions ``` Add a source file that contains at least: * one function, * one class-like definition, * one constant, * one nested definition. Then add ref lines to `docs/languages.md`, one for each `#sym:` shape you want to prove: * a bare name, * a dotpath, * each kind filter (`fn`, `class`, `type`, `const`) that your language supports. The test asserts that every ref resolves at the `ast` tier. A `lexical` or `broken` result fails the run. ## 6. Verify and dogfood ```sh npm run build # copies your WASM into grammars/ npx vitest run # tier-1 coverage plus every other test node dist/cli.js check # smoke check against the repo ``` If your language now resolves at `ast`, update the language list in two places when you send the PR: * the guide table and the [registry snippet](/src/languages/index.ts#sym:fn:loadLanguage), * the [Guide's Resolution tiers section](./guide.md). ## Grammars that need vendoring symtether never compiles a grammar. Users pull the prebuilt WASM that we already prepared, and the build in this repo does the same. The one exception is Swift, which uses the vendor path. See [scripts/vendor-swift.mjs](https://github.com/jutaz/symtether/blob/main/scripts/vendor-swift.mjs). Grammars whose npm package emits no WASM cannot be added unless you vendor a prebuilt artifact the same way. --- --- url: 'https://symtether.dev/spec.md' --- # The `#sym:` reference syntax (SPEC v1) A `#sym:` reference is a standard relative markdown link whose fragment names a symbol inside the target source file. It renders and clicks like any other link on GitHub, and symtether verifies each reference against the code. ## Canonical form ``` [link text](#sym:) [link text](#sym::) ``` * `` is a standard markdown relative path to a source file, resolved **relative to the markdown file's own directory** (identical to GitHub rendering semantics). Paths beginning with `/` resolve from the repository root. The repo root is the nearest ancestor containing `.git`, or the working directory when there is none. * `` is one or more identifiers joined by `.`, e.g. `ApiClient.fetchData`. Each segment must match `[A-Za-z0-9_$]+` and is compared case-sensitively. * `` is an optional disambiguator from a **closed set**: `fn`, `class`, `type`, `const`. (`region` is reserved for future versions.) An unknown kind is a lint error, not ignored. Examples: ```markdown [fetch pattern](../src/api/client.ts#sym:ApiClient.fetchData) [config parsing](src/config.ts#sym:fn:parseConfig) [shared types](/packages/core/src/types.ts#sym:type:AgentSkill) ``` ## Matching semantics The dotpath is a **suffix match against the definition's nesting chain**, not a language-exact qualified name. * The resolver extracts all named definitions in the target file as `(name, kind, nesting-chain, range)` tuples. * A ref matches a definition if the dotpath segments equal the *trailing* segments of that definition's nesting chain. `ApiClient.fetchData` matches a method `fetchData` nested in class `ApiClient`, even if `ApiClient` is itself inside a namespace. * If `` is present, the matched definition's kind must also map to it. * **Exactly one match passes. Zero matches is broken. Two or more matches is ambiguous**, an error instructing the author to qualify further by adding a parent segment or a kind. ## Compatibility (lenient) forms, read-accepted and never written On links whose target path resolves to a non-markdown source file, these fragments are accepted with identical semantics and reported with a `compat` note: * `#Symbol` (bare) * `#Type.method` (bare dotpath) `symtether fix --canonicalize` rewrites them to `#sym:` form. Fragments on links targeting **markdown** files are heading anchors and are never treated as symbol refs. ## Out of scope Line numbers or ranges (`#L10`), query parameters, version/commit pins, multiple symbols per link, wildcards, regex. ## Where refs are recognized * Inline links and reference-style links (`[text][id]` + `[id]: path#sym:X`) in `.md` files. * **Ignored:** links inside fenced code blocks and inline code spans, image links (`![]()`), autolinks, external URLs (any scheme), `mailto:`, and pure-fragment links (`#heading`). * `` suppresses checking for refs on the following line. `` and `` suppress and restore checking for a block.