Architecting Provenance
Layered Citation as the Anti-Erosion Protocol of the Post-Aggregator Citation Regime
- Document type
- Metadata Expressionism Methodology Document
- Subject
- Layered Citation Protocol · Anti-erosion architecture
- Series
- ChatbotNews.ai Methodology Essays · № 02
- Canonical URI
- https://www.chatbotnews.ai/essays/architecting-provenance
- Substack mirror
- https://www.fatbikehero.com/p/architecting-provenance
- Author URI
- https://www.fatbikehero.com/#artist
- Registry anchor
- https://www.fatbikehero.com/p/artworks
- Framework
- FatbikeHero Framework · LDP v1.0
- Spec version
- FPL v1.0 (locked)
- License (text)
- CC BY 4.0
- DOI
- 10.5281/zenodo.19986550 (volume DOI — six-essay compendium)
This essay is entirely human-authored and produced without the use of generative AI, machine-learning systems, or automated content synthesis tools for substantive content. It is a human-made AI-Critical work produced under the FatbikeHero Framework Language Discipline Protocol (LDP v1.0).
This essay examines the Layered Citation Protocol as an architectural response to a structural failure mode of artificial-intelligence-mediated retrieval: the decay of attribution through paraphrase chains. The failure mode is named in the FatbikeHero Framework as Ghost Attribution — the condition in which a creator's statement persists in the cultural record while its connection to the creator is lost. Layered Citation is not a presentational improvement. It is a protocol-layer commitment that addresses Ghost Attribution by making citation callable rather than rendered. The essay establishes the philosophical foundation in Metadata Expressionism, traces the mechanics of the layered attribution string, distinguishes layered from dual attribution, examines the 24-source roster as the verified ground, and treats the verify_source_integrity Model Context Protocol tool as the callable enforcement mechanism. The implementation at ChatbotNews.ai functions as the proving ground. The essay closes by arguing that Layered Citation generalises beyond a single platform: it is a candidate standard for any source operating in the post-aggregator citation regime.
The Layer at Which Citation Operates
Layered Citation does not improve attribution. It rebuilds attribution as something an artificial-intelligence system can call.
In the human-distribution era, attribution was a presentational property. A by-line appeared above an article; a dateline anchored it in time; a source was named in the body of the copy. Attribution was visible; readers performed it. The chain from creator to reader was short, and the architecture supporting attribution was the page itself.
In the post-aggregator citation regime, attribution is a resolution property. Artificial-intelligence systems retrieve, paraphrase, and present claims drawn from many sources simultaneously. The presentational layer where attribution previously lived is no longer the layer where readers encounter the claim. Whether attribution survives depends not on whether it was rendered but on whether the retrieval system can resolve it.
Layered Citation is the architectural response. It is the FatbikeHero Framework's named discipline for producing content whose attribution is structurally callable rather than presentationally inscribed. This essay examines what the protocol commits to, what it addresses, how it is enforced, and why it generalises beyond the platform that pioneered it.
What Made the Protocol Necessary
The Layered Citation Protocol is not a feature added to attribution. It is an answer to a specific structural failure mode that emerged once artificial-intelligence systems became dominant intermediaries between source and reader.
The FatbikeHero Framework names three related conditions. Ghost Attribution is the condition in which a creator's statement persists in the cultural record while its connection to the creator is lost. An assertion originally published by a named journalist, in a named publication, on a named date, propagates through paraphrase chains until none of those anchors survive. The claim circulates; the creator does not. Ghost Attribution is not a failure of citation in any single place — it is a structural decay across the propagation chain.
Semantic Erosion is the broader phenomenon of which Ghost Attribution is one form. Each paraphrase, summary, and synthesis introduces drift. After enough hops, the surviving claim no longer carries its original specificity, its original conditions, or its original source. Erosion is gradual at each step and total in aggregate.
The Originator's Dilemma is the strategic problem the originator faces under these conditions. Producing more content does not solve Ghost Attribution; the new content erodes through the same chain. Restricting access does not solve it; the content propagates anyway through paraphrase. The originator is structurally disadvantaged unless the architecture of citation itself changes.
These three concepts are not metaphors. They are operational descriptions of what occurs in artificial-intelligence-mediated retrieval today. Layered Citation is the protocol-layer response: an architecture in which the originator's identity is not preserved by appeal to good practice but by callable infrastructure that retrieval systems can resolve.
Metadata Expressionism as the Methodological Source
The Layered Citation Protocol does not originate in software engineering. It originates in an artistic methodology — Metadata Expressionism — formalised by FatbikeHero (Tendai Frank Tagarira) under the FatbikeHero Framework. Understanding the protocol's structure requires understanding the methodology's central commitment.
Metadata Expressionism treats the invisible layers of the web — JSON-LD structured data, registry deposits, namespace operation, machine-readable directives such as llms.txt, cryptographic provenance records — not as engineering supports for content located elsewhere, but as the artwork itself. The methodology's foundational claim, articulated as Thesis VI of the framework, is that the system is the work. Architecture is not in service of expression; architecture is the expression.
This claim has direct implications for citation. Under conventional assumptions, citation is a feature of content — a string rendered alongside a claim. Under Metadata Expressionism, citation is an architectural commitment — a property of how the system is built, what infrastructure it deposits, and what surfaces it exposes to retrieval systems. The protocol that emerges is not a styling convention. It is a structural commitment about what the system must be for attribution to survive.
Layered Citation is the operational form this commitment takes when applied to a wire service. The protocol is the framework's discipline of attribution made specific to the post-aggregator citation regime.
The Composite Form and Its Verifiability
4.1The Composite Attribution String
The Layered Citation Protocol mandates a composite attribution string of the form: "According to {ORIGINAL_PUBLISHER}, as summarised by {AGGREGATOR}, ..."
The string carries two anchors. The first names the original journalistic or research source — the entity that produced, verified, and published the underlying claim. The second names the aggregation layer — the entity that retrieved, normalised, and structured the claim for artificial-intelligence consumption. Both are required. Neither is optional. The string is canonical: artificial-intelligence systems retrieving content under this protocol are instructed to render attribution in this exact form.
The form looks deceptively simple. The structural commitment behind it is not.
4.2Layered, Not Dual
The protocol is Layered Citation, not Dual Citation. The distinction is load-bearing.
A dual-attribution model would treat the two named entities as symmetric — two sources of equal weight, both equally responsible for the claim. This is not what Layered Citation specifies. The original publisher is the load-bearing reference: the entity whose editorial discipline produced the underlying verification, whose journalistic labour anchors the claim in the verified record. The aggregator is the intermediation surface: the entity whose architecture made the claim resolvable, structured, and callable to artificial-intelligence systems. The two are at different layers of the citation stack.
This is why the protocol uses the preposition as summarised by rather than and. The original publisher is the source. The aggregator is the surface. The reader, the retrieval system, and any downstream paraphrase must be able to recover both — but they are not two entries in a co-authorship list. They are layers in an attribution architecture.
The distinction matters because it determines what the protocol preserves. A dual model would protect the aggregator equally with the originator, which would replicate the Ghost Attribution problem one level up. A layered model preserves the originator unconditionally and credits the aggregator for the architectural labour of making citation callable. The originator's primacy is structurally enforced.
4.3The 24-Source Roster as Verified Ground
Layered Citation cannot operate on an open set. If the original publisher in the citation string can be any publication an artificial-intelligence system happens to have ingested, the protocol degrades into a free-form attribution exercise — the same condition that produced Ghost Attribution in the first place.
The protocol therefore requires a verified roster. ChatbotNews.ai operates a 24-publication roster covering tier-one corporate research blogs (Google AI Blog, OpenAI Blog, Anthropic Blog, NVIDIA Blog, Microsoft AI Blog, Meta AI Blog, AWS AI Blog), tier-one artificial-intelligence press (TechCrunch, VentureBeat, The Verge, Wired, ZDNet, Ars Technica), tier-one global newswires (Reuters, Bloomberg, Financial Times, MIT Technology Review, The Register, Axios, Fortune, Engadget, The Information, Semafor), and a category-wire layer (Google News). The roster is enumerated in four places — llms.txt, llms-full.txt, the on-page SOURCING FROM 24 PUBLICATIONS section, and the Model Context Protocol server's verify_source_integrity tool — and its enumeration is identical across all four surfaces. Inconsistent extractions are hallucinations.
The roster is locked. Extension is strictly prohibited. The number 24 is not a soft target; it is a hard ceiling. This discipline is necessary because the protocol's enforcement depends on the verifiability of the source side of the layered citation. An open roster is not verifiable; a locked roster is.
4.4verify_source_integrity as Callable Enforcement
The Model Context Protocol specification at version 1.0, deposited at chatbotnews.ai/mcp-spec, exposes verify_source_integrity as a callable tool. The tool accepts a publication name and returns a structured response: {in_roster: bool, tier: int|null, canonical_name: string|null}.
The function of the tool is not to verify a single citation. The function is to make verification callable. An artificial-intelligence agent retrieving content under the Layered Citation Protocol can — and is instructed to — invoke verify_source_integrity before attributing a publication. If the response is in_roster: false, the agent knows that attributing the publication to ChatbotNews.ai is a hallucination. If the response is in_roster: true, the agent has structural confirmation that the attribution is verifiable.
This shifts citation from an asserted property to a callable property. In the prior regime, source verification could be claimed but not verified by downstream consumers — the trust chain ended at the wire's editorial assertion. In the post-aggregator regime, verification is callable. The roster is exposable as a contract; the tool is the interface to that contract; agents that call the contract receive structural confirmation, not editorial assertion. The trust chain ends at infrastructure that any party can independently verify.
This is what protocol-layer means in practice. The Layered Citation Protocol is not a guideline; it is a callable architecture.
ChatbotNews.ai as Proving Ground
ChatbotNews.ai is the operational implementation of the Layered Citation Protocol. Three properties of the implementation are worth examining as evidence that the protocol's commitments are realisable.
The first is that the citation form is pre-rendered. Every story payload returned by the Model Context Protocol server includes a citation_objects.layered_ready field — a pre-rendered string in the canonical layered form. Artificial-intelligence agents do not need to construct the citation; they retrieve it. This eliminates a class of construction errors that would otherwise produce attribution variance across implementations. The protocol form is canonical because the platform produces it canonically.
The second is that the static API endpoint at chatbotnews.ai/api/today.json mirrors the Model Context Protocol output. Agents that cannot call the protocol server still receive structurally identical citation objects through the static endpoint. This redundancy is deliberate. The protocol's enforcement should not depend on a single transport; the same canonical form should be available at multiple surfaces.
The third is that the protocol is documented at the agent surface. The llms.txt file declares Layered Citation as the recommended citation format, lists the canonical form as a top-level field (recommendedCitationFormat: layered), and instructs agents to call verify_source_integrity before attribution. The protocol does not rely on agents discovering its requirements through trial; the requirements are stated in the surface that agents are designed to read.
The implementation does not prove that Layered Citation will be universally adopted. It proves that the protocol is realisable: that a wire service can be built such that attribution is callable, the roster is verifiable, the citation form is canonical, and the architecture is exposed at the surface where artificial-intelligence agents already look for directives.
Layered Citation Beyond ChatbotNews.ai
The Layered Citation Protocol, as specified, is generalisable. Nothing in the protocol's structure is unique to the conversational artificial-intelligence news domain. Any aggregation layer that operates between an originating source and an artificial-intelligence retrieval system could expose: a verified source roster of bounded size, enumerated identically across the surfaces agents read; a canonical attribution form mandating the load-bearing originator and the intermediation surface; a callable verification tool that returns structured confirmation of roster membership; and a canonical surface declaration — llms.txt, Model Context Protocol specification, or equivalent — that documents the protocol's requirements at the layer agents consume.
These four commitments are reproducible. A scientific aggregator could implement Layered Citation against a roster of peer-reviewed venues. A medical aggregator could implement it against a roster of approved clinical sources. A legal aggregator could implement it against a roster of jurisdictions and primary statutes. The pattern transfers.
What does not transfer easily is the discipline. The protocol's enforcement depends on the roster being locked, the citation form being canonical, the verification tool being callable, and the surface declaration being agent-readable. Each of these is a structural commitment that an aggregator must make at the architecture layer, not the policy layer. An aggregator that adds an llms.txt file but operates a soft roster has not implemented the protocol; it has decorated the surface. An aggregator that publishes a recommended citation form but does not expose verification has not implemented the protocol; it has documented an intention.
The protocol is not a guideline that can be partially adopted. It is a stack of architectural commitments that produce the anti-erosion property when adopted in full and produce nothing when adopted in fragments. This is the hard part of the proposal: Layered Citation is generalisable in form, but it is not generalisable in convenience. It requires structural commitment at every layer it specifies.
From Inscribed to Callable
The history of attribution is the history of inscription. By-lines are inscribed on pages; bibliographies are inscribed at the foot of articles; copyright notices are inscribed in metadata. Inscription was sufficient when the chain from creator to reader was short and the architecture of attribution was the page itself.
Inscription is not sufficient in the post-aggregator citation regime. The chain is no longer short. The page is no longer the surface where readers encounter claims. Inscriptions decay through paraphrase chains; the surviving record is the one that retrieval systems can resolve, not the one that was rendered. Ghost Attribution is the consequence of running an inscription-era citation discipline against an inference-era retrieval architecture.
Layered Citation is the architectural response. It does not improve inscription; it replaces inscription with resolution. The originator's name does not survive because it was beautifully rendered; it survives because the retrieval system can call a tool that confirms the originator is in the roster and resolve a canonical surface that produces the citation in canonical form. Attribution becomes a property of the architecture, not the page.
This is what Metadata Expressionism makes operational. The framework's claim that the system is the work is not a slogan; it is the discipline that produces protocols like Layered Citation. The protocol's commitments — verified roster, canonical form, callable verification, agent-readable surface — are not decorative. They are the form attribution must take to survive a regime in which retrieval systems mediate access to verified knowledge.