pdf-to-rag
Local-first RAG pipeline over PDF folders: PDF → text normalization → chunked passages with metadata → embeddings (Transformers.js / ONNX by default, optional Ollama) → versioned JSON/binary index → semantic query returning ranked verbatim excerpts with file name and page for citation. CLI, library, and MCP (stdio or HTTP) share the same application layer (ingest, query, inspect); GitHub Actions CI and layered architecture (commands, application, domain, pipeline modules).