AI Vs Lab

From In Silico to In Vivo: Keeping AI honest in drug development

AI is undeniably transforming drug discovery: from pinpointing promising disease targets to sketching drug-like molecules, flagging safety risks, and streamlining workflows that once took months into days.

The story started from early QSAR models, through the deep-learning boom of the 2010s, to today’s generative systems and structure-prediction breakthroughs like AlphaFold3. However, computer suggestions can only go so far and do not help patients until research laboratories prove they are real, safe, and developable. That is why the most interesting collaborations now pair AI with rapid experimental testing. For example, Takeda’s newly expanded partnership with Nabla Bio which couples AI-designed protein therapeutics with rapid lab validation, with success-based milestone payments worth more than $1 billion [1]. The direction is clear: big pharma is connecting AI and assays into the same loop, not swapping one for the other.

A short history of AI in drug discovery

QSAR and the classical era (1960s–2000s)

The use of Computational drug design in therapeutic development has deep roots. Quantitative structure–activity relationships (QSAR) linked simple molecular features (such as hydrophobicity or electronic properties) with measured biological effects. These models helped chemists “predict before they make”, allowing them to prioritise which analogues to synthesise and test [2]. As the tools matured, basic machine-learning methods and structure-based docking were introduced, but accuracy and generalisability were limited by small, biased datasets and the sheer complexity of biological systems [3]. At this point, models were used to focus experimental work not to replace it.

The deep-learning decade (2010s)

As datasets grew and compute improved, deep neural networks entered the scene. A well-known example is AtomNet (2015), the first structure-based convolutional neural network for bioactivity prediction. AtomNet reported strong performance against docking benchmarks by learning 3D interaction features directly from protein–ligand structures [4]. In parallel, new image-based phenotypic screening [5] and sequence-to-sequence models for reaction prediction and retrosynthesis arrived [6]. Together, these advances delivered more accurate virtual screens, smarter hit triage, and early signs of closed-loop discovery accelerating drug development. However, the final word still lay with the biochemical and cell-based assays

Figure 1 AI in drug discovery timeline

Figure 1 – A timeline of AI in drug discovery. From QSAR models that linked simple molecular descriptors to biological activity, through the rise of deep learning, to today’s generative design and complex structure prediction.

What AI is doing today (and why it matters)

We are now very much in the “in silico + in vitro” era: Where AI systems are being integrated at most stages of drug development e.g. proposing targets, molecules, and designs.

Target identification and validation

Modern drug discovery starts with choosing tractable targets. AI can help here by integrating multi-omic data (genetics, transcriptomics, proteomics) and scientific literature into network-level models that prioritise likely causal mechanisms rather than loose correlations. A well-known example is baricitinib, originally developed for rheumatoid arthritis and later repurposed for COVID-19. An AI-driven knowledge graph at BenevolentAI highlighted baricitinib’s dual anti-viral and anti-inflammatory potential early in the pandemic. This signal was subsequently proven in clinical trials and regulatory approvals. It stands as a public, real-world demonstration that AI can generate clinically useful hypotheses from noisy, fragmented data, rather than simply re-discovering what we already know [7].

Hit and lead generation

AI is also being used to design de novo molecules. A prominent example is Insilico Medicine’s idiopathic pulmonary fibrosis (IPF) programme. Their AI platform identified TNIK (Traf2- and Nck-interacting kinase) as a target and designed a novel small molecule, ISM001-055 (INS018-055), to modulate it. That compound has produced positive Phase 2a data in IPF patients, with a good safety profile and dose-dependent improvements in lung function. Strong evidence that generative pipelines can produce genuinely clinic-bound candidates, not just interesting virtual hits [8].

The same principles now apply to proteins and antibodies. Platforms like Nabla Bio’s JAM (Joint Atomic Modelling) can design antibodies de novo from a target’s sequence or structure, generating therapeutic-grade leads in formats such as VHHs and full-length antibodies. These antibodies are then expressed and tested experimentally, with the model iteratively refined on the resulting data [1]. AI advances such as AlphaFold act as powerful enablers for these platforms. AlphaFold’s leap in single-protein structure prediction, recognised with the 2024 Nobel Prize in Chemistry, made accurate 3D models accessible for thousands of targets [9]. AlphaFold3 extends this to complexes and interactions – predicting complex structures that include protein–protein, protein–nucleic acid and protein–ligand assemblies. For discovery teams, this means richer structural context for both small molecules and biologics: better docking hypotheses, more realistic epitope models and clearer ideas about how a designed antibody might “see” its antigen [10].

Nevertheless, these models still generate hypotheses. Binding poses, interfaces and epitope maps all need to be confirmed in the lab before they can be used to make confident project decisions.

Lead optimisation and early developability

Even a brilliant in silico design is useless if you cannot make it, or if it fails basic “drug-likeness”. This is where the practical side of AI comes in, not just proposing molecules, but pressure-testing whether they are synthesisable and worth the next round of bench work. Commercial platforms such as Iktos are built around this idea. Makya is positioned as a generative design platform that aims to produce diverse, novel, and synthetically accessible molecules [11]. Spaya complements that by doing AI-driven retrosynthesis, turning a target compound into routes back to commercially available starting materials [12]. Rather than a human mapping every step from starting materials to final product, these systems learn from large reaction databases to propose plausible, often non-obvious, routes.

In parallel, a growing ecosystem of DMPK prediction models e.g. ADMET predictor by SimulationsPlus estimates properties such as solubility, metabolic stability and toxicity early in the process. Even if these predictions are imperfect, they help teams deprioritise compounds with obvious developability problems before they consume budget and time in the lab [13].

Importantly, none of these tools guarantee success, but together they cut avoidable dead-ends and keep experimental resources focused on molecules and biologics with a realistic chance of making it to the clinic. Every candidate that survives these in silico tests at each step still has to prove itself in subsequent real experiments.

Figure 2 Drug Discovery Stages

Figure 2 – Drug discovery stage map: AI supports target selection, hit and lead generation, and early developability triage, while wet-lab assays provide the evidence that confirms activity, mechanism, and risk. Linking the two into a single feedback loop turns predictions into decisions and improves the next modelling round.

To understand why AI must be coupled to experiments, it helps to look at where models still fall short and how real-world experiments can close the gap at each discovery stage.

Data problems (bias, sparsity, domain shift)

Drug discovery data can be messy: selective publication, lab-to-lab variability, and over-representation of certain chemotypes. Models trained on such data may look accurate on familiar scaffolds but generalise poorly to genuinely novel compounds. This leads researchers to repeatedly flag this brittleness; the fix is careful curation, uncertainty tracking, and fresh experimental data to recalibrate the model [7].

Chemical realism (synthesisability and stability)

A generator can propose molecules that optimise a mathematical score yet are impossible or impractical to make, or chemically unstable. This is why modern workflows increasingly build “chemical realism” into the loop rather than treating synthesis as an afterthought. For example, Iktos positions Makya around designing molecules that are synthetically accessible, so outputs are closer to what a lab can actually build [11].

Overfitting and “hallucinations”

Like language models, AI models in drug discovery can be confidently wrong. A docking or activity-prediction network may latch onto spurious correlations; a generator might output plausible-looking but biologically irrelevant structures. Without a reality check, even very sophisticated systems can send programmes down expensive dead-ends [14]. The solution for these apparent weaknesses in AI is not to abandon it, but to put it into a tight feedback loop with experiments.

How experiments keep AI honest

Computational models clearly have limitations therefore, for every key prediction by AI, there needs to be a corresponding bench test to validate their results, here are a few example stages where this can occur:

  • Hit validation. For any an AI-proposed “hit”, these need to be confirmed that they really engage the target. The use of orthogonal biochemical or biophysical assays can achieve this. For example, enzyme activity readouts alongside SPR/ITC binding assays or thermal shift, to rule out artefacts such as aggregation or fluorescence interference. Only hits that can be reproduced across independent assays are worth carrying forward as leads.
  • Structural confirmation. If modelling (including AlphaFold-enabled docking) suggests a particular binding mode this can be confirmed via structural biology (X-ray, cryo-EM, NMR). Mutagenesis studies can also further test mechanistic hypotheses e.g. does changing a predicted contact residue reduce potency?
  • DMPK and safety. Compound leads which hit the target and are predicted to be stable/safe, must still be tested in vivo: metabolic stability (microsomes/hepatocytes), permeability and exposure (e.g. Caco-2/MDCK), solubility and plasma protein binding, plus targeted safety panels such as hERG and other off-target screens. Computational DMPK models can flag risks, but only these experiments show how a specific molecule actually behaves in a biological system.
Figure 3 Closed loop discovery in practise DMTL

Figure 3 – Closed-loop discovery in practice: Design, Make, Test, Learn. Models propose and prioritise candidates, chemistry and biology generate real measurements, and structured results feed back into the next iteration. This loop is how AI increases speed and breadth without replacing the experimental checks needed to keep programmes grounded in biology.

What this shows that in practice is that AI never works in isolation. Every promising prediction is treated as the start of an experiment, not the end of one. Well-designed assays turn model output into hard evidence, and that evidence feeds back to refine the next round of models, creating a feed-forward loop.

Conclusion

AI is already adding value across the classic drug discovery path, helping teams prioritise targets, generate hits and leads, and triage chemistry for makeability and early developability. But the core principle has not changed. Models produce hypotheses, not proof, and biology is too noisy, variable, and context-dependent to trust predictions without evidence. Even major advances in structure prediction and molecular design are best viewed as accelerators of what to test, not substitutes for testing.

The winning approach is a disciplined loop: AI proposes and prioritises, the wet lab measures what is true, and those results refine the next round. That is how AI augments, rather than replaces, experimental drug development.

References

  1. Reuters. “U.S. biotech Nabla Bio, Japan’s Takeda expand AI drug design partnership.” 14 Oct 2025. https://www.reuters.com/business/healthcare-pharmaceuticals/us-biotech-nabla-bio-japans-takeda-expand-ai-drug-design-partnership-2025-10-14/ https://www.nabla.bio
  2. US EPA. “(Quantitative) Structure Activity Relationship [(Q)SAR] Guidance Document.” Last updated 30 Oct 2025. https://www.epa.gov/pesticide-registration/quantitative-structure-activity-relationship-qsar-guidance-document
  3. Pagadala, N. S., Syed, K., Tuszynski, J. “Molecular Docking: Shifting Paradigms in Drug Discovery.” International Journal of Molecular Sciences (2019) 20(18):4331. https://www.mdpi.com/1422-0067/20/18/4331
  4. Wallach, I., Dzamba, M., Heifets, A. “AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery.” arXiv (2015). https://arxiv.org/abs/1510.02855
  5. Trends in Biotechnology (ScienceDirect). “Deep learning for image-based phenotypic screening.” https://www.sciencedirect.com/science/article/abs/pii/S0962892422002628
  6. Schwaller, P. et al. “Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.” ACS Central Science (2019). https://pubs.acs.org/doi/pdf/10.1021/acscentsci.9b00576
  7. Smith, D. P., Oechsle, O., Rawling, M. J., Savory, E., Lacoste, A. M. B., Richardson, P. J. “Expert-Augmented Computational Drug Repurposing Identified Baricitinib as a Treatment for COVID-19.” Frontiers in Pharmacology (2021). https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2021.709856/full
  8. Insilico Medicine. “TNIK IPF Phase 2a” (company update/news page). https://insilico.com/news/tnik-ipf-phase2a
  9. Nobel Prize. “Press release: The Nobel Prize in Chemistry 2024.” 9 Oct 2024. https://www.nobelprize.org/prizes/chemistry/2024/press-release/
  10. Abramson, J. et al. “Accurate structure prediction of biomolecular interactions with AlphaFold 3.” Nature (2024). https://www.nature.com/articles/s41586-024-07487-w.pdf
  11. Iktos. “Makya by Iktos | Generative AI Platform for de novo Drug Design.” https://iktos.ai/solution/makya
  12. Iktos. “Spaya by Iktos | AI-Driven Retrosynthesis Platform.” https://iktos.ai/solution/spaya
  13. Simulations Plus. “ADMET Predictor.” https://www.simulations-plus.com/software/admetpredictor/
  14. Journal of Cheminformatics (BMC/Springer Nature). Article discussing risks such as evaluation artefacts and “hallucinations” in generative approaches (2025). https://link.springer.com/article/10.1186/s13321-025-01108-y
Go to Top