Data model
Every record is a negative finding: a single, normalized statement that something failed, with the evidence to cite it.
Modality
The primary partitioning dimension — the kind of therapeutic the finding is about: small_molecule, clinical_trial, antibody, adc, bispecific, crispr, peptide, protac, oligonucleotide, vaccine. Filter on it first; the dataset and its indexes are organized around it.
Finding type & outcome
finding_type is the specific failure (e.g. inactive_compound, terminated_trial, failed_developability, failed_approval). outcome is the coarse result — inactive, terminated, failed_safety, failed_efficacy, rejected_approval, retracted.
Target & compound
For protein-target small molecules, target_gene_symbol and target_family (kinase, GPCR, protease…) are populated; they are null for trials, antibodies, and other modalities. Compound-linked findings join to a normalized compound entity (structure, ChEMBL ID, max clinical phase) so one molecule resolves across sources.
Provenance
Every finding carries enough to verify and cite it:
source_type— the database it came from.source_doi/source_pmid/source_url— the primary reference.source_license— redistribution terms for that record.verification_status—auto_publishedvshuman_reviewed.extraction_confidence— 0–1 score from the extraction pipeline.source_retracted— whether the underlying paper was retracted.
Try filtering across these dimensions on the live demo, or see them in the REST API reference.