The Gene Box
The science

Not scraped. Not hallucinated. Curated.

TGB's knowledge graph is built by scientists, not crawlers. Every association is sourced, graded, and validated before it informs a clinical report.

150,000+

Curated references

14,000+

Knowledge graph nodes

100M+

Mapped relationships

1,200+

Biomarkers tracked

Curation pipeline

Five stages from source to insight

Raw data doesn't become clinical intelligence on its own. Our pipeline enforces quality at every step.

1

Source

Systematic ingestion from 20+ authoritative databases: ClinVar, OMIM, PharmGKB, PubMed, CPIC, USDA, and more.

2

Extract

Structured parsing of gene-variant-disease-drug-nutrient relationships using domain-specific NLP pipelines.

3

Normalize

Ontology mapping to HGNC gene symbols, SNOMED, ICD-10, RXNORM, and MeSH — ensuring cross-source consistency.

4

Validate

Expert review by clinical geneticists and molecular biologists. Every association is graded by evidence level.

5

Publish

Versioned release into the Evolveme.ai knowledge graph, with audit trails and change logs for every node.

Evidence sources

Where the knowledge comes from

Six domains. Twenty-plus authoritative databases. All systematically integrated and cross-validated.

Genomic databases

Clinically validated variant interpretations and gene-disease associations.

ClinVarOMIMClinGen

Pharmacogenomics

Drug-gene interaction evidence and clinical dosing guidelines.

PharmGKBCPICDPWG

Literature

Peer-reviewed publications and systematic reviews across all domains.

PubMedPMCCochrane

Nutrition & Lifestyle

Nutrient composition data and dietary reference intakes.

USDANutritionData

Microbiome

Gut microbiome diversity, strain data, and disease associations.

GMrepogutMDisorderBacDive

Clinical guidelines

Professional society recommendations for variant interpretation and reporting.

NCCNACMGAMP
Why it matters

Scraped vs curated — not the same thing

Most AI health tools rely on scraped or LLM-generated content. TGB is different.

Scraped / LLM-generated

  • No audit trail — impossible to verify source or date
  • Hallucinations present clinically plausible but false associations
  • No evidence grading — all claims treated equally
  • Outdated at deployment; no versioned update process
  • Not built for clinical or regulatory accountability
  • May ingest retracted papers or predatory journal content

TGB Curated

  • Every association traceable to a primary source with PMID or DOI
  • Clinical expert review at the validation stage
  • Evidence graded: pathogenic / likely pathogenic / uncertain / benign
  • Versioned releases with full change logs and deprecation notices
  • Built to ISO 9001:2015 standards with ISO 13485 certification in process
  • Cochrane and CPIC guidelines prioritised for therapeutic claims
Knowledge graph

14,000 nodes. 100 million relationships.

Genes connect to diseases, drugs, nutrients, lifestyle factors, and microbiome strains — all in a single traversable graph.

Gene4,200+Disease2,800+Drug1,900+Nutrient1,200+Lifestyle860+Flora1,600+Biomarker950+

Simplified visualisation — actual graph contains 14,000+ nodes across 8 entity types

Credentials

Recognised by the industry's standard-setters

Illumina recommended

Certified partner laboratory

ThermoFisher recommended

Validated assay pipelines

ISO 9001:2015

Certified

GDPR compliant

EU data protection standards

Science you can stand behind

See how the knowledge graph powers clinical-grade reports across genomics, microbiome, and blood intelligence.