Biotech and Life Sciences

Augmenting Drug Discovery with Artificial Intelligence

Home

Blog

Biotech and Life Sciences

Jacek Szmatka

All posts by this author

Share this article

Subscribe to our newsletter

Custom Healthcare Software Development

From Reactive to Proactive – IT Strategies in Healthcare are Changing, and Outsourcing Teams Can Help

A Look at the Key Trends in Artificial Intelligence for 2025

All articles from this category

Published: 2024/02/01

7 min read

Artificial intelligence (AI) is profoundly impacting drug discovery, a critical area in the biotech industry. Developing new drugs is a time-consuming and expensive process, often involving trial and error. Moreover, low-hanging fruits in this field are all gone, meaning researching new medicines has become even more challenging and pricy. The advancements in high-throughput biological and chemical data generation have enabled detailed disease characterization, often down to the single-cell level. The other side of the coin is that the amount of data goes far beyond the study capabilities of a human being, making AI a valuable ally in the modern AI drug discovery process and the development of biotechnology software development solutions.

With AI, scientists can utilize algorithms like machine learning and deep learning to analyze vast datasets, predict drug-target interactions and identify potential drug candidates faster and more accurately. Up-to-date algorithms are designed to learn from large amounts of data and make predictions based on patterns and trends. They can identify new drug targets, dock molecules and even design new compounds with generative AI. Finally, high-throughput wet-lab screening can be augmented and closed in a loop with AI to provide quick learning and improvement.

Generative AI takes the lead… compound generation

Let’s examine a few instances where AI has been successfully implemented to impact the drug discovery process. For example, a recently published update to AlphaFold, an algorithm from DeepMind at Google, previously famous for solving complex protein structures, has now outperformed long-established industry standards such as AutoDock and Vina for molecular docking to identify drug molecules that can bind to specific target proteins. Similar tools have also been developed for their own use by the most intrepid AI-driven drug discovery companies. The AI algorithms they are taking advantage of already go beyond docking, as they can design new compounds with desired biological activity and chemical characteristics.

For example, the AI-chemistry company Iktos developed Makya, its generative AI-driven de novo design software for Multi-Parametric Optimization (MPO) which allowed it to partner with Pfizer in 2021. Another biotechnology company, Insilico Medicine, famous for its lead position in a race to develop the first drug discovered and generated by AI (now in phase II of clinical trials), created Chemistry42, a generative AI-driven active learning platform capable of, according to Insilico, discovering novel lead-like structures in days, as unlikely as it sounds.

Target discovery unbiased from human

Target discovery is where the medical revolutionbegins, as pinpointing the root cause of the disease and the right target to go after opens the door to new and better therapies. Finding such a target involves studying vast amounts of data. Nowadays,scientific data often comes from various experiments, often available in the public domain, written by scientists and published in scientific journals. The key is to integrate all the data into the answers you seek. It’s a Mount Everest-size problem for scientists, and a task well-suited for AI. Therefore, it is unsurprising that there are systems already in place to accumulate and integrate omics data, which including results across domains such as epigenetics, genomics, transcriptomics, proteomics, metabolomics and sometimes even at a single-cell level. Such systems also annotate observations with information found in biological and disease databases, or straight from scientific papers, thus enabling scientists to understand the disease and find its weak points.

What are some of the practical results? Insilico developed pandaOmics, a software tool used for analyzing and interpreting harmonized omics data. The designed AI biotech software ranks, scores and evaluates gene targets based on a disease of interest and AI-driven hypothesis generation. The tool also includes a pathway analysis approach to infer pathway inhibition or activation. Insilico even mentions that when analyzing significant genes in a given experiment, they collect information about them from a total of 30 million publications, 3.8 million patents and 3 million grants in the life sciences space. BenevolentAI, a company that inked multi-million-dollar drug discovery deals with AstraZeneca and Merck KGaA, describes their BenAI engine as providing a holistic view of the disease and enabling uncovering novel insights thanks to domain-specific AI.

Connecting the information dots with graphs

One of the most efficient ways to integrate all the information humans generate about health and disease is by building graphs and taking advantage of large language models (LLMs). In a graph-based approach, datasets are integrated by their relation, which can be established with existing knowledge bases (e.g., gene pathways) or via experimental data (e.g., same sample data). AI in such a process can provide invaluable help to deal with enormous amounts of data by helping to organize it, detect noise or anomalies and provide advanced search tools to generate novel hypotheses about disease cause, suitable target, or a good biomarker.

A good example of graphs being utilized in such a way is the continuously updated Relation Integration Engine (CURIE) created by bioinformatics Data4Cure. According to its inventors, it captures over 2 billion relations from 300+ thousands of entities spanning 7 domains, including cell types, organs and tissues, drugs and more. The described graph can help researchers understand disease mechanisms and drug modes of action, identify and validate targets, reposition drugs and even provide context for the interpretation of clinical trial results.

Large language models for drug discovery

How can biotech companies benefit from large language models (LLMs)? LLMs have been all the rage for the last few months, with platforms like ChatGPT from OpenAI or PaLM used in Bard from Google. Implementing a LLM seems fitting for ingesting large amounts of scientific paper and helping to answer specific questions that scientists might have. However, general-purpose LLMs are imperfect and known for hallucinating, which means giving inaccurate answers to some questions. To avoid such outcomes, organizations build and implement smaller, fit-for-purpose models. One example is Ferma.ai – a chatbot created with healthcare data.

Furthermore, when you treat nucleic or aminoacidic sequences as language, LLMs can provide very useful functions. Biotech company Atomic AI created a LLM focused on RNA-drug discovery. The tool’s task is to design RNA therapeutics by optimizing their stability, toxicity and translational efficiency. The company’s model is ATOM-1, a propriety platform component that maps chemical data to predict RNA’s structure and function. There are many other projects related to LLMs in biotech. In 2022, NVIDIA unveiled a set of LLMs under the umbrella of the BioNeMo framework dedicated to chemistry and biology: ESM-1, OpenFold, MegaMolBART, ProtT5.

Image analysis in the life sciences

Image analysis is a field of life sciences in which machine learning and AI have been making a big impact for the longest time now. Deep Neural Networks made their way into headlines with diagnostic applications famously outperforming humans in melanoma diagnosis, as described in Nature in September 2021. A few years prior, AI created a stir in radiology, as described in Nature Review Cancer in 2018. No surprise image analysis would also apply to drug discovery with tools like phenotypic screening. In such a screening, multiple pictures or videos of cells and tissues are collected under tested conditions. A pharmaceutical company Daiichi Sankyo uses LPIXEL’s IMACEL platform to combine phenotypic screening with AI for quantitive assessment of drug responsiveness in the discovery phase, tox assessment in preclinical development and the grouping of patients for clinical trials.

(Almost) a self-driving drug discovery

There is an even more advanced concept of finding novel insights into diseases, including potential targets, that lets the AI steer the experiments itself. Although not as advanced as self-driving cars, the algorithms can identify gaps in the data and facilitate designing experiments to improve the understanding of the problem at hand. The Recursion Operating System implements such an idea, which is enabling the company to reshape the drug discovery funnel. Their approach increases the number of potential therapeutic starting points and restricts them relatively quickly, saving costs and accelerating the whole process. Biotechnology corporation Genentech shares a similar story about their collaboration with NVIDIA, emphasizing that thanks to AI and lab in the loop they open doors to what others consider impossible.

Challenges of AI in drug discovery

While integrating AI into the drug discovery process brings immense opportunities, it also faces some justified limitations due to ethical considerations and challenges. The key issues that need to be addressed are privacy, security and bias. Privacy and data security concerns arise when artificial intelligence analyzes patients’ data. Investing in advanced data governance solutions is crucial for companies as much of the AI calculations are done in the cloud.

There is growing awareness of how algorithms are prone to bias and that mitigation strategies are necessary to avoid disparities. It is important to ensure that the data used for AI algorithms is diverse and representative of the entire population, as these algorithms rely on large amounts of data for training. Moreover, transparency and accountability in AI algorithms are crucial to maintaining trust, which is sometimes difficult given many models are not explainable. Addressing these challenges is a general frontier in using AI in healthcare. However, it’s worth underlining that such ethical considerations are less common for drug discovery applications of AI and have a more significant impact when AI is used in diagnostics, precision medicine and clinical trials.

Empower your organization with AI

The augmentation of the biotech industry with AI holds tremendous promise in accelerating drug discovery process. With continued advancements in AI algorithms and increased collaboration between biotech experts and AI researchers, the industry can expect groundbreaking developments that positively impact human health and well-being.

Multiple drug discovery process tasks can already be aided by a tool that can increase its effectiveness, reduce costs and help the organization to achieve more. This article only scratches the surface of what is possible. Though AI is versatile, it often requires additional work to fit a process or organization’s needs. Working with AI specialists to develop a custom solution is a wise choice if you lack expertise. Contact our team using this form to create a bespoke AI platform to address your biotechnology software.

Sources

AlphaFold’s whitepaper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/a-glimpse-of-the-next-generation-of-alphafold/alphafold_latest_oct2023.pdf

Makya: https://iktos.ai/technology-and-capabilities/generative-ai/

Iktos and Pfizer: https://www.businesswire.com/news/home/20210302005501/en/Iktos-Announces-Collaboration-With-Pfizer-in-AI-for-Drug-Design

Insilico’s first AI drug: https://www.prnewswire.com/apac/news-releases/first-drug-discovered-and-designed-with-generative-ai-enters-phase-ii-trials-with-first-patients-dosed-301862737.html

Chemistry42: https://insilico.com/chemistry42

Pandaomics: https://insilico.com/pandaomics

BenevolentAI and AstraZeneca: https://www.fiercebiotech.com/medtech/astrazeneca-takes-home-two-more-computer-generated-drug-targets-benevolentai

BenevolentAI and Merck KGaA: https://european-biotechnology.com/up-to-date/latest-news/news/benevolentai-ltd-inks-us594m-partnership-with-merck-kgaa.html

BenAI engine: https://www.benevolent.com/benevolent-platform/benai-engine/

CURIE: https://www.data4cure.com

LLM-based chat for life sciences: https://www.ferma.ai/

LLM for RNA: https://atomic.ai

NVIDIA BioNeMo: https://blogs.nvidia.com/blog/bionemo-large-language-models-drug-discovery/

AI outperforms dermatologists in skin cancer diagnosis: https://www.nature.com/articles/s41598-021-96707-8

AI in Radiology: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6268174/

IMACEL: https://www.nature.com/articles/d43747-021-00042-w

Recursion Operating System: https://www.recursion.com/approach

Genentech and NVIDIA – AI and the lab in the loop: https://www.youtube.com/watch?v=-Ijg2g8AsjE

Bias in AI for healthcare: https://www.nature.com/articles/s41746-023-00858-z

All posts by this author

About the authorJacek Szmatka

Head of Life Sciences

An open-minded leader with over 20 years’ experience in the IT world, Jacek’s career has seen him evolve from a computer science graduate to software engineer to a co-founder and CTO of a tech start-up. Before joining Software Mind, Jacek was part of a team that developed a bioinformatics company and served as an executive board member. In his current role as Head of Life Sciences, Jacek helps leading life sciences companies design and build innovative solutions. A true believer in the transformative power technology can have on our lives, Jacek maintains a keen interest in R & D, in particular with solutions that involve AI, IoT, life science and cloud technologies.

Subscribe to our newsletter

Most popular posts

Przemysław Frąckowiak-Szymański

2025/06/26

The Rise (and Risk) of Vibe Coding – What’s Worth Knowing

Tomasz Kuc

2025/05/22

SIEM – Proactively Combating Cyber Threats and Monitoring Infrastructure Security

Rafał Jasiński

2025/04/24

Implementing a New Integrated Circuit into a Product That Runs Embedded Linux

Alexandru Oana

2025/04/10

Casino Software Development: Key Features, Trends, and Best Practices

Newsletter

Share this article

Related posts

Generative AI takes the lead… compound generation

Target discovery unbiased from human

Connecting the information dots with graphs

Large language models for drug discovery

Image analysis in the life sciences

(Almost) a self-driving drug discovery

Challenges of AI in drug discovery

Empower your organization with AI

About the authorJacek Szmatka

Subscribe to our newsletter

Most popular posts