Applicability Domain

From Self-sufficiency
Jump to: navigation, search

The Applicability Domain (AD) of a QSAR is the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds.

The purpose of AD is to state whether the model's assumptions are met. In general, this is the case for interpolation rather than for extrapolation. Although up to now there is no single generally accepted algorithm for determining the AD, there exists a rather systematic approach for defining interpolation regions[1]. The process involves the removal of outliers and a probability density distribution method using kernel-weighted sampling. A recent rigorous benchmarking study of several AD algorithms identified standard-deviation of models as the most reliable approach [2].

To investigate the AD of a training set of chemicals one can directly analyse properties of the multivariate descriptor space of the training compounds or more indirectly via distance (or similarity) metrics. When using distance metrics care should be taken to use an orthogonal and significant vector space. This can be achieved by different means of feature selection and successive principle components analysis.

Notes

  1. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: QSAR applicabilty domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 2005, 33(5):445-459
  2. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model. 2008 Sep;48(9):1733-46.