Semantic Traversal & Origin Verification Engine
This POC currently mocks the heavy data retrieval. To transition to a live, accurate engine, the following architecture must be deployed.
| Layer | Free Tier (Current) | Production / Paid Integration |
|---|---|---|
| Schema Extraction | Client-side Regex/Mock | OpenAI gpt-4o-mini (Cost: $0.15/1M input tokens) |
| Vector Similarity | Local string matching | Sentence-BERT embeddings + Pinecone Serverless |
| Surface Web Crawl | Hardcoded trace graph | GDELT Project (Free/S3) + Common Crawl |
| Dark/Fringe Crawl | Hardcoded trace graph | Telethon API (Telegram), 4plebs API, Pushshift (Reddit) |
| Screenshot Wall | Bypassed | Perceptual Hashing (Local) -> Gemini 1.5 Flash Vision (Triage OCR) |
Never expose keys to the client. The frontend (this UI) will not make direct calls to OpenAI, Pinecone, or Dataminr.
Implementation: Build a Backend-for-Frontend (BFF) using Node.js or Python (FastAPI).
The browser sends `{ "claim": "..." }` to your backend route `POST /api/trace`.
The backend server securely reads `.env` variables, orchestrates the LLM schema extraction, vector DB query, and web crawls, structures the Trace Graph JSON, and returns it to the client.
Rate limiting (e.g., via Upstash/Redis) is applied at the backend to prevent API abuse.