Data analysis methods for classification of patient samples shown to
9 September 2008
Computer-based methods that have been used for 25 years for
classifying patient samples are worthless, according to researchers at
Uppsala University, Sweden in an article published in the journal
Pattern Recognition Letters .
Today there is rapidly growing interest in ‘intelligent’
computer-based methods that use various classes of measurement signals,
from different patient samples, for instance, to create a model for
classifying new observations.
This type of method is the basis for many technical applications,
such as recognition of human speech, images, and fingerprints, and is
now also beginning to attract new fields such as healthcare.
“Especially in applications in which faulty classification decisions
can lead to catastrophic consequences, such as choosing the wrong form
of therapy for treating cancer, it is extremely important to be able to
make a reliable estimate of the performance of the classification
model,” explains Mats Gustafsson, Professor of signal processing and
medical bioinformatics at Uppsala University, who co-directed the new
study together with Associate Professor Anders Isaksson.
To evaluate the performance of a classification model, one normally
tests it on a number of trial examples that have never been involved in
the design of the model. Unfortunately there are seldom tens of
thousands of test examples available for this type of evaluation.
In biomedicine, for example, it is often expensive and difficult to
collect the patient samples needed, especially to analyze a rare
disease. To solve this problem, many different methods have been
proposed. Since the 1980s two methods have completely dominated
research: cross validation and resampling/bootstrapping.
“This has entailed that the performance assessment of virtually all
new methods and applications reported in the scientific literature in
the last 25 years has been carried out using one of these two methods,”
says Mats Gustafsson.
In the new study, the Uppsala researchers use both theory and
convincing computer simulations to show that this methodology is
worthless in practice when the total number of examples is small in
relation to the natural variation that exists among different
observations. What is considered a small number depends in turn on what
problem is being studied — in other words, it is impossible to determine
whether the number of examples is sufficient.
“Our main conclusion is that this methodology cannot be depended on
at all, and that it therefore needs to be immediately replaced by
Bayesian methods, for example, which can deliver reliable measures of
the uncertainty that exists. Only then will multivariate analyses be in
any position to be adopted in such critical applications as healthcare,”
says Mats Gustafsson.
1. A. Isaksson, M. Wallman, H. Göransson, MG. Gustafsson
Cross-validation and bootstrapping are unreliable in small sample
classification. Pattern Recognition Letters Volume 29, Issue 14,
Pages 1960-1965 (15 October 2008)