Data model

Every record is a negative finding: a single, normalized statement that something failed, with the evidence to cite it.

Modality

The primary partitioning dimension — the kind of therapeutic the finding is about: small_molecule, clinical_trial, antibody, adc, bispecific, crispr, peptide, protac, oligonucleotide, vaccine. Filter on it first; the dataset and its indexes are organized around it.

Finding type & outcome

finding_type is the specific failure (e.g. inactive_compound, terminated_trial, failed_developability, failed_approval). outcome is the coarse result — inactive, terminated, failed_safety, failed_efficacy, rejected_approval, retracted.

Target & compound

For protein-target small molecules, target_gene_symbol and target_family (kinase, GPCR, protease…) are populated; they are null for trials, antibodies, and other modalities. Compound-linked findings join to a normalized compound entity (structure, ChEMBL ID, max clinical phase) so one molecule resolves across sources.

Provenance

Every finding carries enough to verify and cite it:

  • source_type — the database it came from.
  • source_doi / source_pmid / source_url — the primary reference.
  • source_license — redistribution terms for that record.
  • verification_statusauto_published vs human_reviewed.
  • extraction_confidence — 0–1 score from the extraction pipeline.
  • source_retracted — whether the underlying paper was retracted.

Try filtering across these dimensions on the live demo, or see them in the REST API reference.