Knowledge Architecture & Risk Analysis

Session 4 & 5 Deliverables · RealSkin Data Moat Strategy

1. Knowledge Graph Architecture

Core Entities (Nodes)

EntityAttributes (Properties)Validation Source
User (SkinIQ) ID, Skin_Type, Fitzpatrick_Scale, Sensitivity, Primary_Concern, Hormonal_Pattern, Climate_Geo User Input + Apple/Google Health API
Product ID, Brand, Name, Category, Price, Active_Ingredients, Formulation_Type Brand APIs + INCI Database
Ingredient INCI_Name, Comedogenic_Rating, Irritation_Level, Function, Contraindications Clinical/Dermatology Journals
Review ID, User_ID, Product_ID, Star_Rating, Text, Time_Used, Verified_Purchase Platform Generated
Dermatologist ID, Name, Board_Cert_Number, Specialties, Clinic_Location Medical Board API (NPI Database)

Semantic Triples (Relationships)

(User_A) --[HAS_PROFILE]--> (SkinIQ_Profile)
(SkinIQ_Profile) --[LIVES_IN]--> (Climate_Humid)
(User_A) --[WROTE]--> (Review_123)
(Review_123) --[RATES]--> (Product_BHA)
(Product_BHA) --[CONTAINS]--> (Ingredient_SalicylicAcid)
(Ingredient_SalicylicAcid) --[CONTRAINDICATES]--> (Ingredient_Retinol)
(Dermatologist_DrMehta) --[VERIFIED]--> (SkinIQ_Profile)

2. "What We Don't Know" — AI & Domain Risk Analysis

The Risk Register (Failure Tree)

Failure ModeSeverityTrigger / "What We Don't Know"Mitigation / Kill Criteria
Cold Start Graph Poisoning HIGH Without initial users, the match algorithm has no data. Do we simulate reviews? If we use AI to scrape/generate initial reviews, we violate our "Authenticity" pillar. Mitigation: Launch in private beta to 500 Owen/Vanderbilt students. Manually seed the database with real humans.
Kill Criteria: If we generate fake reviews, the brand dies.
Medical Hallucination (Routine Generator) HIGH LLM Routine Generator hallucinates and tells a user to layer 20% AHA with 0.1% Tretinoin, causing severe chemical burns. Mitigation: Hard-coded rule engine intercepts LLM output. The Generative Graph cannot output a routine without checking the [CONTRAINDICATES] edges in the Knowledge Graph.
Marketplace Contamination MED SecondSkin users sell expired or contaminated products, causing infections. Mitigation: Only allow resale of products in "pump" packaging or sealed containers. Ban jar packaging from the marketplace. Implement batch-code verification.
Health Data Privacy Liability HIGH Apple Health data (menstrual cycles, sleep) is highly sensitive post-Roe v. Wade. Mitigation: Zero-knowledge architecture. Cycle tracking stays on-device. Platform only receives anonymous hash flags (e.g., Phase_3) not dates/medical records.

AI Reliability Audit

Where LLMs Fail in Skincare: LLMs are built on internet consensus. The internet thinks "coconut oil is great for skin" because of SEO spam, despite it being highly comedogenic. If we use a standard RAG pipeline, RealSkin will give bad advice.

The Fix (Grounded AI): RealSkin's Generative Graph does not query the open web. It uses a strict RAG pipeline grounded only in our internal Ingredient Database (verified by derms) and our internal Review Graph. If a product isn't in our graph, the AI says "I don't know," rather than hallucinating.