System Architecture // Optimization

Solving Speed, Scale, and Access

How to prevent a deep temporal traversal from taking 10 minutes, bloating your database, and hitting API rate limits.

1. The Speed Problem

You don't traverse the internet for every query. If a deep trace takes 5 minutes, an agent cannot wait for it. You solve this with a Tiered Verification Pipeline. Viral claims are mathematically finite; millions of people ask the exact same question.

Tier 1: Schema Cache Hit
Latency: 15ms

Extract schema. Check local vector DB. Has this claim been traced before? If yes, instantly return the cached Time-Zero Passport.

Tier 2: LLM Fast Heuristic
Latency: 1.5s

If no cache hit, query the LLM ensemble for a general consensus on origin. Return a "Provisional Rating" so the agent can act immediately.

Tier 3: Asynchronous Deep Trace
Latency: 2-10 mins

The deep dark-web/GDELT trace runs in the background. Once it finds Time Zero, it updates the Tier 1 cache for all future agents.

2. The KG Bloat Problem

You don't store the whole internet. You don't build a massive Knowledge Graph of all data. You build an Ephemeral Trace Graph that dies, leaving behind only the Passport.

When a query triggers Tier 3, the system spins up a temporary graph to map the nodes (Twitter -> Telegram -> 4chan). Once it identifies the absolute Time Zero node, it extracts the origin rating, saves the final Passport to your Postgres/Vector DB, and deletes the traversal graph.

// What you actually store:
{
  schema_hash: "immigrant_eat_pet_us",
  time_zero_node: "fb_private_grp_screenshot",
  time_zero_date: "2024-08-10T14:32:00Z",
  origin_rating: { inst: 0.1, fringe: 0.95 }
}

3. The "Sign-In" Problem

You don't build 50 scrapers. Managing API keys, rate limits, and auth tokens for Telegram, Reddit, 4chan, X, and Facebook is a nightmare and will get your IPs banned.

The Solution: Threat Intel Firehoses. You buy API access to aggregators that already scrape the dark/fringe web legally. Companies like Dataminr, Meltwater, or Recorded Future already have pipelines into Telegram, Gab, 4chan, and Reddit. You don't scrape the web; you query their APIs using your Schema Vectors.

Surface: GDELT (Free bulk files), Common Crawl (Free AWS S3)
Social/Dark: Meltwater / Dataminr API (Paid enterprise access)