An anti-fabrication pipeline turns the Brihaddeśī into three layered artefacts: the named bricks the text talks about, the operative rules attached to them, and an executable formal grammar synthesised per domain.
Mentions of the same term across pages are clustered into a single concept. Each concept carries its raw mentions, its page footprint, its co-occurrence degree.
For each concept, the operative rules the text asserts. Every rule is backed by ≥1 verbatim quote (anti-fabrication). 10 categories: definition, structural, relation, enumeration, assignment, derivation, classification, composition, validation, transformation.
Visual structures lifted from the scanned pages (tables, ordered sequences, diagrams) and cited verbatim by the rules. These are the *evidence* behind every "the text says N of X" claim. Five kinds — see below.
Per-domain executable Python modules. Domains derived by Leiden community detection on the concept-edge graph (6c.1); types by formal concept analysis (6c.2); operations + constraints synthesised per domain (6c.3); assembled into a unified package (6c.4).
Svara, Grāma, Jāti).JSON-over-HTTP exposure of every layer of this grammar : concepts, rules, structures, formal-grammar manifests, domain handbooks. Designed so a downstream tool (a notebook, a paper's analysis pipeline, another encyclopedia) can consume the grammar without scraping the HTML pages.
/grammar/concepts.