Deep Learning Imputation – Using AI to Derive Valuable Insights from Drug Discovery Data

Speaker: Dr. Sumie Tajima
Institute: Hulinks, Inc.
Country: Tokyo, Japan
Speaker Link: https://www.hulinks.co.jp/en
Speaker: Dr. Matthew Segall
Institute: Optibrium Limited
Country: Cambridge, UK
Speaker Link: https://optibrium.com/
Time: 09:00 CET 07-Feb-23

Dr. Sumie Tajima and Dr. Matt Segall

(Dr. S.T.) Hulinks, Inc., Tokyo, Japan; (Dr. M.S.) Optibrium Limited, Cambridge, UK

It’s impossible to experimentally measure all of the data we want for all compounds in a drug discovery project. Furthermore, the limited data we have are noisy because of experimental variability and error.
We will describe a method that uses deep learning to impute these sparse and noisy data. Imputation is the process of filling in missing data in a dataset using the data that are present, which appears simple when only a few datapoints are missing, but is challenging when more than half, or even 99% of the data are missing.
Deep learning imputation learns from both structure-activity relationships (SAR) and directly from the relationship between experimental endpoints based on sparse data [1]. The resulting models can proactively highlight high-quality compounds by ‘filling in’ missing data more accurately than conventional quantitative structure-activity relationship (QSAR) models. Furthermore, it can identify hidden opportunities caused by missing, uncertain or inaccurate data, and prioritise experimental resources by focussing on measuring the most valuable data to inform decisions about compound progression.
We will describe practical applications of deep learning imputation and compare the results with those from conventional predictive modelling methods. We will demonstrate the application in the context of a drug discovery project, in which deep learning imputation achieved an average R2 of 0.72 vs 0.50 for the best QSAR method across 18 heterogeneous endpoints, including compound activities and ADME properties [2]. We will also present an application in combination with generative chemistry methods to identify a novel, active antimalarial compound that revealed new SAR, previously unknown to the project team [3]. Finally, we will show an application to the prediction of particularly challenging sensory properties, assessed in panels of human subjects and compare the results with other methods, including multi-target deep neural networks [4].


References

[1] - Irwin et al. App. AI Lett. (2021) DOI: 10.1002/ail2.31
[2] - Irwin et al. J. Chem. Inf Model. (2020) 60(6), pp. 2848–2857 
[3] - Tse et al. J. Med. Chem. (2021) 64(22) pp 1645-16463
[4] - Mahmoud et al. J. Comput. Aided Mol. Des. (2021) 35(11) pp. 1125-1140