METHODOLOGY

How This Publication Works

This page documents the research process that every article on this site is built from. Every claim carries a tier. Every citation links to an indexed document. Every document has been read end to end and assessed against a protocol. The architecture below is not overhead. It is what separates research from speculation.

§01SCOPE

What the Research Is Drawn From

The inaugural body of work is a primary-source audit of the 2025 United States Department of Justice Epstein Files Transparency Act release. The release comprises approximately 1.38 million parsed pages across 12 datasets, derived from federal court proceedings and legally compelled disclosures. All source material is public record.

The research does not attempt to index all 1.38 million pages. It follows threads. An investigation leads to entities; entities lead to documents; documents lead to evidence; evidence leads to new questions. The indexed vault grows from the threads, not from a bulk import. Analysis and interpretation are original to the publication; attribution for the pre-processed data layer is to the open-source rhowardstone / Epstein-research-data v4.0 project.

§02SOURCE-TIER DISCIPLINE

The three-tier publication scale is the grammar every article and every finding on this site uses. It is the tier the reader sees. A claim that cannot carry its tier does not ship.

Primary Source

The claim rests on a document the reader can retrieve and read. Court filings, signed agreements, indexed emails, government records, official correspondence. The document is in the vault. The citation is a link.

EXAMPLES

Court orders, signed agreements, depositions, indexed correspondence, grand jury exhibits, federal prosecution memoranda, flight logs, financial records

Corroborated Reporting

The claim rests on reporting by an identified journalist or institution, corroborated by at least one independent source. The original reporting is cited. The corroboration is noted.

EXAMPLES

Investigative journalism with named sources, court reporting confirmed by filings, institutional records, on-record statements

Single Source

The claim rests on a single source that has not been independently corroborated. It is published because the source is credible and the claim is relevant, but the reader should weight it accordingly.

EXAMPLES

Single-source reporting, unverified witness accounts, uncorroborated claims from credible outlets, recovered redacted text that stands alone

§03INTERNAL RELIABILITY

The Assessment Scale Underneath the Publication Tier

Every evidence note in the vault is assessed on a five-tier internal reliability scale. This is the scale the reporter uses when weighing whether a claim is ready for publication. It is finer-grained than the three-tier scale the reader sees. A Tier 1 primary-source citation on a published article rests on at least one Internal-Tier-1 or Internal-Tier-2 underlying assessment. A Tier 4 (Disputed) assessment blocks publication until the contradiction is resolved or explicitly surfaced.

VERIFIED

Confirmed by two or more independent primary sources.

A flight log entry corroborated by a passport stamp and a hotel register.

CORROBORATED

Supported by one primary source plus surrounding context.

Deposition testimony consistent with dated email records in the corpus.

UNCORROBORATED

Single source, no contradicting evidence.

One witness statement with no other references in the indexed record.

DISPUTED

Contradicted by other evidence.

Document X says Epstein was in Palm Beach on a date; flight records place him in Paris.

SPECULATIVE

Inference or hypothesis, not directly evidenced.

Pattern analysis suggesting a connection that no single document establishes.

§04THE CORPUS

The research runs against four databases, each specialized for a specific task. Separation of concerns is the architectural principle: the databases hold data, the knowledge graph holds relationships, the indexed vault holds analysis. Each tool does what it is best at. The vault links to the others but does not try to replace them.

full_text_corpus.db

≈6.3 GB · ≈1.38M parsed pages, SQLite FTS5 index

The searchable text of every document in the 2025 DOJ EFTA release. FTS5 queries surface candidate passages; the specific page is always cited.

redaction_analysis_v2.db

≈940 MB · 2.6M redactions, ≈39K pages with recovered text

Redaction regions and reconstructed text behind the black bars. Recovered text is treated as Tier 3 (Uncorroborated) at best; it earns higher reliability only when a primary source confirms it.

communications.db

≈30 MB · Email metadata with participant resolution

Sender, recipient, date, thread participation across the indexed email corpus. Used for participant and frequency analysis; never substituted for the email body itself.

knowledge_graph.db

524 entities · 2,302 connections · Curated entity graph

Typed relationships between persons, organizations, properties, aircraft, and shell companies. Used to navigate the corpus, not as a source of substantive claims on its own.

§05DOCUMENT ASSESSMENT PROTOCOL

Before a document enters the vault as an anchor source, it passes through an eight-step assessment. The steps are sequential and each is non-negotiable. Speed is the enemy of accuracy.

Record the metadata

EFTA number, page count, document type, original date, source dataset. No document enters the index without these.

Read the full document

End to end, not just the passage the search returned. Speed is the enemy of accuracy.

Identify all entities mentioned

Link to existing entity records in the vault. Flag any names that do not yet have a profile for creation.

Extract key claims with page references

One claim, one page cite. Paraphrase or quote, but never strip the page number.

Assess relevance to active investigations

Which open threads does this document affect? If it reopens a prior question, log the reopening.

Assign a reliability tier

Using the 5-tier internal scale (Verified / Corroborated / Uncorroborated / Disputed / Speculative). With justification.

Note OCR and redaction issues

Any scan quality, character substitution, or redaction artifact that affects interpretation gets recorded alongside the claim.

Cross-reference

Against other documents mentioning the same entities, events, or dates. Flag anything that corroborates. Flag anything that contradicts.

§06CITATION PROTOCOL

What Every Claim Must Reference

A factual claim without an inline citation is a bug. Vague attributions ("court records show," "reporting has established") without a named source are bugs. Citations buried in footnotes or grouped at the end of a paragraph are bugs. Every analytical claim on this site carries, inline, the following:

EFTA document numberThe specific primary source, bracketed as shown above the inline bracket style.
Page numberWhere the specific claim can be verified within the document.
How the document was foundSearch query, entity connection, or thread of discovery, carried in the document record, not the prose.
Reliability tier with justificationTracked in the evidence note backing the claim, visible in the article's tier meter.

Bracketed references like [EFTA00800253, p. 2] appear in the sentence where the claim is made. Every bracketed reference is a live link to the corresponding document record in the vault. The reader can verify any claim at the point of reading without scrolling or navigating away.

§07HANDLING OCR ERRORS

The corpus was digitized via OCR from scanned documents of varying quality. Known error classes are documented below. When a claim rests on quoted text, any OCR quality issue that affects interpretation is noted alongside the quotation.

Character substitution

1/l, 0/O, rn/m, cl/d, fi/ft

Names in particular are vulnerable. "Lesley" surfaces as "Lesley"; "Leslie" surfaces as "Leslie"; both refer to the same person in different documents. Prefix search is used to catch variants.

Missing text

Poor scan quality, photocopy degradation, bleed-through from reverse pages

Gaps in the indexed text do not imply gaps in the source document. When a quotation runs into a gap, the gap is marked.

Formatting artifacts

Table cells parsed as prose, headers parsed as body, footnotes floated into paragraphs

Where indexing has scrambled the structural layout, the article cites the original page position rather than the indexed token order.

Redacted region garbage

Black bars OCRed as strings of symbols, underlines parsed as dashes

Redacted regions are flagged before quoting adjacent text. Garbage output is never carried into the published piece without marking.

§08HANDLING REDACTED CONTENT

The Redaction Recovery Caveat

The redaction_analysis_v2 database contains text recovered from beneath redactions using image-analysis techniques. Recovered text is not the same as unredacted text. It is inference from pixel data, not production by the government. The discipline this publication applies to recovered text:

Recovered text is treated as Tier 3 (Uncorroborated) at best.
It may be partial, garbled, or systematically biased by the recovery technique.
It has not been verified against original unredacted documents — which, by definition, the publication does not have.
It gains reliability only when an independent primary source confirms the same fact.
The database's interest-score field flags higher-confidence recoveries; a score above 5 is noted when a claim rests on recovered text.
Recovered names are never elevated to Tier 1 claims in an article without an independent corroborating source.

§09HANDLING CONTRADICTIONS

When Evidence Contradicts Evidence

Contradictions are treated as their own research object, not as an inconvenience to be resolved by preference. The procedure:

Document both pieces of evidence separately, each with its citation and internal tier.
Assign Internal Tier 4 (Disputed) to the contradicted claims.
File the contradiction in the Contradictions analysis folder.
Open the contradiction as its own investigation thread.
Do not resolve contradictions by choosing the version that fits the expected narrative.

A contradiction that cannot be resolved is still published. The article names both sides of the disagreement and flags the contradiction as open. The reader is not asked to trust the reporter's preference between competing primary sources.

§10AVOIDING CONFIRMATION BIAS

The Epstein corpus is the kind of material that will confirm any hypothesis the reporter starts with, if the reporter only looks for confirmations. The discipline against that tendency is procedural, not moral.

Actively search for evidence that contradicts the working hypothesis. Negative results are part of the record.
Document searches that returned nothing, with the query and the date, so later work does not repeat the dead end.
Maintain a running Contradictions file and review it at the start of every session.
Distinguish between "no evidence found" and "evidence of absence." The first is a gap in the record; the second is a finding.
When the reporter feels most certain about something, that is the moment to check hardest.

§11ETHICAL CONSTRAINTS

Victim privacy

The knowledge graph limits victim names. The publication does not attempt to identify unnamed victims from contextual clues in the documents. The focus is on perpetrators, enablers, and systemic failures, not on the identification of the people they harmed.

Speculation discipline

Network proximity is not evidence of wrongdoing. Appearing in a flight log is not evidence of criminal activity. Being named in a document is not evidence of knowledge or participation. The distinction between association and complicity is maintained at every step.

No manufactured connections

A line between two entities is drawn only where a named document establishes it. Pattern recognition is welcome. Conspiracy thinking is not. Every published connection could be sustained in a deposition.

Publication review

All source material is public record. Analysis and interpretation are the work of the reporter. Before any article ships, every analytical claim is audited against the source-tier discipline and the epistemological language conventions. If a claim cannot carry its tier, it does not ship.

§12INLINE CITATIONS

How Citations Work

Every bracketed reference like [EFTA00800253] in the prose is a live link to the corresponding document record in the vault. Citations appear in the sentence where the claim is made, not in a footnote block at the bottom of the page. The reader can verify any claim at the point of reading without scrolling or navigating away.

The anchor-documents rail on every article page lists all primary sources the article rests on, with document types and page counts. This is the article's evidentiary foundation, visible at a glance. The article's tier meter shows the T1 / T2 / T3 distribution across the piece. A reader who only scans the rail and the meter still knows what the article is built on and how heavily it rests on primary sources.

§13OPEN QUESTIONS

What the Record Does Not Show

Every article on this site ends with two sections that are editorial content, not footer metadata: What the Record Does Not Show and Several Items Remain Open. These are the last paragraphs of the piece. They are not concessions and they are not caveats. They are findings.

An investigation that publishes only what it found and hides what it could not find is advocacy. An investigation that publishes what it could not find, with specific document targets and the next retrieval steps, is a case file. This publication ships case files. Gaps in the record are part of the record.

The reader is treated as an adult. The documents are named. The tiers are visible. The contradictions are surfaced. The questions the record cannot answer are published. When the reader follows the citations into the vault, the chain of evidence is reproducible. That is the standard the work is built toward.