Confusion matrix

Shengping Yang PhD, Gilbert Berdine MD

Corresponding author: Shengping Yang
Contact Information: Shengping.Yang@pbrc.edu
DOI: 10.12746/swrccc.v12i53.1391

I am evaluating the sensitivity and specificity of an assay for COVID-19 diagnosis, and our team is developing a confusion matrix for this analysis. Could you explain the key considerations when using a confusion matrix for this purpose?

In biomedical research, particularly when evaluating diagnostic tests or predictive models, performance metrics are essential for assessing the effectiveness of assays or classification systems. One commonly used tool is a contingency table, which displays the frequency distribution of categorical variables. A confusion matrix is a specialized form of a contingency table used to assess the performance of classification algorithms by showing the actual versus predicted outcomes. It is particularly useful for evaluating key metrics such as sensitivity, specificity, accuracy, and precision, which are crucial for interpreting the performance of diagnostic tests.

1. THE CONFUSION MATRIX

A confusion matrix is a specific type of two-dimensional contingency table used to evaluate the performance of a classification model. Its two dimensions, “actual” and “predicted,” represent identical sets of “classes” (e.g., disease positive and disease negative), allowing for a direct comparison between actual and predicted outcomes.1,2

Specifically, in a confusion matrix, each row represents an actual class, while each column represents a predicted class (or vice versa). The diagonal cells represent correctly predicted outcomes, while the off-diagonal cells represent misclassifications. The matrix provides a clear visualization of where the model confuses different classes, which is why it is called a confusion matrix.

Table 1 provides an example of a confusion matrix, where the rows represent actual conditions, and the columns represent predicted conditions. The matrix contains four key components: True Positives (TP): The model predicts positive, and the actual condition is positive; False Negatives (FN): The model predicts negative, but the actual condition is positive; False Positives (FP): The model predicts positive, but the actual condition is negative; True Negatives (TN): The model predicts negative, and the actual condition is negative. Furthermore, the sums of these components define: Actual positive cases (P) = TP + FN; Actual negative cases (N) = FP + TN; Predicted positive cases (PP) = TP + FP; Predicted negative cases (PN) = FN + TN.

Table 1. An Example Confusion Matrix

Predicted
Positive (PP) Negative (PN)
Actual Positive (P) True Positive (TP) False Negative (FN)
Negative (N) False Positive (FP) True Negative (TN)

While the confusion matrix presents data in a straightforward manner, several important metrics can be derived from it to assess the performance of a diagnostic test or predictive model.

2. KEY METRICS DERIVED FROM THE CONFUSION MATRIX

Commonly used metrics that can be derived from a confusion matrix include.2,3

Table 2. An Example of an Imbalanced Dataset

Predicted
Positive (PP) Negative (PN)
Actual Positive (P) 0 10
Negative (N) 0 990

Table 3. An Example of Test Results for a Rare (Low Prevalence) Disease

Predicted
Positive (PP) Negative (PN)
Actual Positive (P) 90 (100 × 90%) 10
Negative (N) 99 9,801 (9,900 × 99%)

There are other metrics, such as the F1 Score, the Fowlkes-Mallows Index, and the Matthews Cor­relation Coefficient, etc.5 However, these will not be discussed in detail.

3. APPLICATIONS IN BIOMEDICAL RESEARCH

There are many applications of a confusion matrix in biomedical research:

4. CHALLENGES IN BIOMEDICAL APPLICATIONS

There are challenges in determining the best application of a confusion matrix in biomedical research.

Additional challenges include threshold tuning and multi-class classification. These issues must be addressed with careful consideration of the specific diseases or conditions being evaluated in clinical practice.

5. CONFUSION MATRIX AND RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE

Both the confusion matrix and the ROC curve are essential tools for evaluating classifier performance, but they serve different purposes. The confusion matrix offers insights into performance at specific decision points, detailing metrics such as precision, sensitivity, and accuracy at a particular threshold. In contrast, the ROC curve plots sensitivity against the FPR (1–specificity), at various classification thresholds, which summarizes performance across all possible thresholds, providing a broader perspective on how effectively the classifier distinguishes between positive and negative classes.12 Importantly, both sensitivity and FPR used in ROCs can be derived from a confusion matrix. In this sense, the confusion matrix and the ROC curve complement each other in evaluating the performance of a diagnostic tool.

In summary, the confusion matrix is an important tool for evaluating diagnostic tests/tools performances, particularly in healthcare and biomedical research. While it provides straightforward metrics for assessing the performance of such tests/tools, the interpretation of these metrics can sometimes be misleading, especially in the context of rare diseases. Nevertheless, confusion matrices are increasingly used in many areas of biomedical research including personalized medicine, explainable artificial intelligence modeling, etc.


REFERENCES

  1. Chapter 2 Contingency Tables (uchicago.edu). (last access: Oct. 5, 2024)
  2. Confusion matrix – Wikipedia. (last access: Oct. 5, 2024)
  3. Monaghan TF, Rahman SN, Agudelo CW, et al. Foundational statistical principles in medical research: sensitivity, specificity, positive predictive value, and negative predictive value. Medicina (Kaunas) 2021 May 16;57(5):503. doi: 10.3390/medicina57050503
  4. Guesné SJJ, Hanser T, Werner S, Boobier S, Scott S. Mind your prevalence! J Cheminform 2024 Apr 15;16(1):43. doi: 10.1186/s13321-024-00837-w
  5. Chicco D, Jurman G. A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes-Mallows index. J Biomed Inform 2023 Aug; 144:104426. doi: 10.1016/j.jbi.2023.104426
  6. Lin D, Liu L, Zhang M, et al. Evaluations of the serological test in the diagnosis of 2019 novel coronavirus (SARS-CoV-2) infections during the COVID-19 outbreak. Eur J Clin Microbiol Infect Dis 2020 Dec;39(12):2271–7. doi: 10.1007/s10096-020-03978-6. Epub 2020 Jul 17.
  7. Pal M, Parija S, Panda G, Dhama K, et al. Risk prediction of cardiovascular disease using machine learning classifiers. Open Med (Wars) 2022 Jun 17;17(1):1100–13. doi: 10.1515/med-2022-0508
  8. Gull S, Akbar S, Khan HU. Automated detection of brain tumor through magnetic resonance images using convolutional neural network. Biomed Res Int 2021 Nov 30; 2021:3365043. doi: 10.1155/2021/3365043
  9. Klauschen F, Goldman A, Barra V, et al. Evaluation of automated brain MR image segmentation and volumetry methods. Hum Brain Mapp 2009 Apr;30(4):1310–27. doi: 10.1002/hbm.20599
  10. Adeluwa T, McGregor BA, Guo K, et al. Predicting drug-induced liver injury using machine learning on a diverse set of predictors. Front Pharmacol 2021 Aug 18;12:648805. doi: 10.3389/fphar.2021.648805
  11. Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet 2015 Jun;16(6):321–32. doi: 10.1038/nrg3920
  12. Confusion Matrix: How To Use It & Interpret Results [Examples] (v7labs.com). (last access: Oct. 5, 2024)

Article citation: Yang S, Berdine G. Confusion matrix. The Southwest Respiratory and Critical Care Chronicles 2024;12(53):75–79
From: Department of Biostatistics (SY), Pennington Biomedical Research Center, Baton Rouge, LA; Department of Internal Medicine (GB), Texas Tech University Health Sciences Center, Lubbock, Texas
Submitted: 10/9/2024
Accepted: 10/10/2024
Conflicts of interest: none
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.