28 Feb 2024

# Ascertainment Bias – Machine Learning in the Medical Sector

1. Clinical Trials
2. Elementary Statistics
3. Ascertainment Bias

## 1 – Clinical Trials

A clinical trial is a study to answer a scientific question, such as whether a new treatment works better than the old treatment. Clinical trials are used to find out whether new treatments are safe and effective.

Clinical trials are research studies that test how well new medical approaches work in people. The goal of most of these trials is to find better ways to prevent, diagnose, or treat diseases. New treatments might be tested against the best current treatment, or against a placebo (a dummy pill or injection that has no real effect). In some cases, a new approach might be compared with standard care. You can check out the trials available near your location by a quick google search.

## 2 – Elementary Statistics

Elementary stats is the study of collecting, organizing, and analyzing data. It is a cornerstone of mathematics, providing analytical methods to understand data and make informed decisions.

The most common use of this is in the business world, where it is used to make predictions about future outcomes based on past data. However, it can also be applied to other fields such as science and healthcare.

### Machine Learning

Elementary stats are vital for machine learning for a variety of reasons. For one, understanding basic statistical concepts allows machine learning engineers in the medical field to evaluate data properly. Without a foundation in statistics, it would be difficult to determine if a dataset is representative of a population, or if results from experiments are statistically significant.

In addition, many methods used in machine learning research require a strong understanding of statistics. Data mining techniques such as regression and classification often utilize statistical methods, and machine learning algorithms frequently rely on statistics as well. Understanding elementary statistical concepts is thus essential for any machine learning engineer. More information on another similar bias called allocation bias.

## 3 – Ascertainment Bias

### What is Ascertainment Bias

Ascertainment bias is the systematic tendency for investigators to study the characteristics of those cases or individuals that are most easily identified and/or studied. It results in a distortion of the scientific evidence because it leads to an overemphasis on certain types of information and an underrepresentation of other types of information.

For example, ascertainment bias can cause investigators to focus on studies that use diagnostic tests that are easy to administer (e.g., blood tests) rather than diagnostic tests that are more difficult to administer (e.g., examinations of the brain). This can lead to a distorted view of the incidence and prevalence of diseases because studies that use easier diagnostic tests will be more likely to be published than studies that use more difficult diagnostic tests.

### How does it affect your machine learning models

Ascertainment bias affects machine learning models by skewing the data that the model is trained on. This can result in inaccurate predictions and a poorer performance overall. Ascertainment bias typically occurs when data is collected in a non-random way, such as only collecting data from people who are already sick or only collect data from people who have already been diagnosed with a certain condition. This can lead to false conclusions about how effective a treatment is or what causes a disease.

### Spotting Ascertainment bias

To detect if your dataset has ascertainment bias, you’ll need to look for patterns in the data that indicate that some items are more likely to be included than others. This can be done using various machine learning algorithms.

One approach is to use a supervised learning algorithm to train a classifier on the data. The classifier can then be used to predict whether an item is more likely to be included in the dataset or not.

Another approach is to use unsupervised learning algorithms such as clustering to identify groups of items that are more likely to be included in the dataset. By comparing the groups, you can look for patterns that indicate biases in the dataset. You can learn some more machine learning techniques by Raptor.

### Removing Ascertainment bias

Remember ascertainment bias is a type of selection bias that occurs when the selection of study subjects is not randomized. This can cause differences between groups of subjects that are not due to the intervention or disease being studied, but rather to the method by which the study subjects were chosen.

There are a few ways to try to minimize ascertainment bias in your dataset:

1. Randomize your selection of study subjects. This will help to ensure that any differences between groups are due to the intervention or disease being studied, and not due to the method by which the study subjects were chosen.
2. Use a control group. A control group is a group of study subjects who do not receive the intervention or treatment being studied.