Core Features
Two-pass extraction
Pass 1 — Structural (free, deterministic)
Runs with llm-wiki .:
- Code (18 languages) — tree-sitter AST with rich metadata:
- Classes, functions, methods, imports
- Typed inheritance edges:
extends(base class) andimplements(interface) - Function signatures: parameters + return types preserved verbatim
- Doc comments (Javadoc, JSDoc, GoDoc,
///) - Call graph (function-to-function calls)
- Markdown/text — headings, definitions, cross-document links
- DOCX/PDF — converted to text, then parsed
- Images — hub nodes (content needs agent mode)
- Cross-reference — code entities mentioned in docs get
mentionsedges
Pass 2 — Semantic (agent mode)
Runs in Claude Code via /wiki .. Dispatches subagents for files structural can’t handle:
| File Type | Structural | + Agent | Verdict |
|---|---|---|---|
| Code (18 langs) | Full AST + doc comments | — | No agent needed |
| Markdown | Headings + links | 2x entities | Optional |
| DOCX | Hub nodes only | 30x entities | Use agent |
| Scanned PDF | 0 text | 85x entities | Use agent |
| Images (HEIC/PNG/JPG) | Hub nodes only | Vision OCR | Use agent |
Typed inheritance edges
Instead of a single generic inherits relation, the graph distinguishes:
extends— class inherits from a base classimplements— class conforms to an interface/protocol/trait
Languages supported for typed inheritance:
| Language | extends | implements | Notes |
|---|---|---|---|
| Java | ✅ | ✅ | First-class grammar support |
| Python | ✅ | — | Single concept, handles Generic[T] |
| TypeScript | ✅ | ✅ | Separate extends/implements clauses |
| Kotlin | ✅ | — | : delegation_specifiers |
| C# | ✅ | ✅ | First entry = extends, rest = implements |
| C++ | ✅ | — | : public Base |
| Ruby | ✅ | — | < Base (mixins need agent mode) |
| PHP | ✅ | ✅ | Separate clauses |
| Scala | ✅ | ✅ | First = extends, with Trait = implements |
| Swift | ✅ | — | Class base + protocol conformance merged |
Query: llm-wiki query neighbors Serializable → shows all classes implementing it.
Function signatures
Every function/method node carries a signature field with params and return type:
llm-wiki query node processOrder
processOrder()
source: src/orders.ts L45
type: code community: 3 degree: 8
signature: (order: Order, user: User): Promise<Result>
doc: Process an order for the given user. Returns result or throws.
Signature extraction supports: Python, TypeScript, JavaScript, Java, Kotlin, C#, C++, Ruby, PHP, Scala, Swift. Works with generic parameters, default values, nullable types.
Test result on Python codebase (kioku): 529 / 991 code nodes have signatures (54% coverage — classes and untyped functions have no signature).
Doc comment extraction
Automatically extracts business logic from inline documentation:
| Language | Format | Example |
|---|---|---|
| Java, Kotlin, Scala, PHP | /** ... */ | Javadoc |
| JavaScript, TypeScript | /** ... */ | JSDoc |
| Go | // ... before func/type | GoDoc |
| Rust | /// | Doc comments |
| C# | /// | XML docs |
| Swift, Ruby | ///, # | Doc comments |
Tested: 1,773 / 12,424 nodes enriched with Javadoc descriptions on a Java codebase.
Community detection
Leiden/Louvain groups related nodes. No embeddings — pure graph topology.
- Adaptive resolution: tight for small codebases, broad for >5K nodes
- Semantic labels from top-degree nodes
- Cohesion scores
- Oversized communities auto-split
Cross-reference code ↔ docs
Automatic mentions edges when a code entity name appears in doc text. Tested: 460 code↔doc edges on a mixed Python repo.
SHA256 cache
File hashes in wiki-out/cache/. Unchanged files skip extraction on re-runs. Large codebases (1,000+ files) benefit significantly on second build.
CLI
llm-wiki . # build graph
llm-wiki query search <terms> # keyword search
llm-wiki query node <label> # node details + doc comment
llm-wiki query neighbors <label> # direct connections
llm-wiki query community <id> # community members by degree
llm-wiki query path <A> <B> # shortest path
llm-wiki query gods # top 10 most connected
llm-wiki query stats # summary
llm-wiki lint # health check
llm-wiki watch . # auto-rebuild on changes
llm-wiki add <url> # fetch URL as markdown
llm-wiki note "<insight>" [--link <node>] [--tag <tag>] # write-back insight
llm-wiki --no-viz . # skip HTML for large graphs
llm-wiki --version # show version
Write-back from LLM sessions
Karpathy’s vision is a compounding artifact — the wiki grows with every session. llm-wiki note closes the loop:
llm-wiki note "GraphStore uses SHA256 because cache needs stable hash across runs" \
--link GraphStore --tag rationale
The note is saved to wiki-out/ingested/note-<timestamp>-<slug>.md with YAML frontmatter (type, date, tags, links). On the next llm-wiki . rebuild:
- The note file is picked up like any other markdown
[[WikiLinks]]in the body becomementionsedges to existing nodes- The insight is searchable via
llm-wiki query search <term>
Claude Code integration: SKILL.md instructs agents to call llm-wiki note proactively when they explain non-obvious rationale, make architectural decisions, or discover hidden constraints. One insight per note, written as why not what. See SKILL.md → Write-back section for the full heuristics.
Obsidian compatibility
wiki-out/vault/ is a drop-in Obsidian vault. Each node becomes one markdown file with:
[[WikiLinks]]for every graph edge — Obsidian backlinks work immediately- YAML frontmatter with
id,type,community,degree,source_file— renders as Obsidian Properties (1.4+) - Inline
#tagsfrom community labels — appear in Obsidian’s tag pane - Pre-configured graph colors via
.vault/graph.json— community coloring matches the vis.js graph
llm-wiki .
# Obsidian → Open folder as vault → select wiki-out/vault/
Trade-off: Obsidian wikilinks are untyped, so extends / implements / calls / mentions all render as generic links in Obsidian’s graph view. Use llm-wiki query neighbors <label> from the CLI for typed-edge detail.
Schema rules
Create .wikischema for custom entity and relation types:
{
"entity_types": ["code", "document", "paper", "image", "concept"],
"relation_types": ["imports", "calls", "references", "explains"]
}