SIIM-ACR Pneumothorax Segmentation

`Can Artificial Intelligence recognize pneumothoraces(Collapsed Lung) from Chest X-ray and save lives?`

Published in

Analytics Vidhya

19 min readDec 8, 2020

AI Robot Analysing Digital Medical Images — Image Courtesy: https://www.medimaging.net/industry-news/articles/294778410/acr-releases-second-research-road-map-on-medical-imaging-ai.html

Artificial intelligence has taken over all kinds of industries, believe it or not, every application you use in your mobile phone is using AI to some extent, there are so many medical treatments which are using AI for the diagnosis of various diseases. In fact, the digital imaging field in healthcare industries is a very popular way for diagnosis of major diseases and nowadays Artificial Intelligence is helping so much for such diagnosis by analyzing the Digital Imaging of X-rays, CT-Scans, etc. In this blog, I am going to showcase my work on a case study “SIIM-ACR Pneumothorax Segmentation’’ which includes recognition of lung disease using Chest X-rays.

So let’s just look at the outline of the blog,

Business Problem
Mapping Business Problem into Deep Learning Problem
Existing Approaches
My First cut approach
Exploratory Data Analysis
Data Preprocessing
Deep Learning models
Final Pipeline
Deployment
Future Extensions
References

Now let’s begin the AI story,

Note: This is going to be a long story but trust me it’s really interesting. Those who are in a hurry you can directly go to the deployment section of this blog to see a working web application demo.

1. Business Problem:

1.1 Description:

First, we have to understand what Pneumothorax is, right?

So,

Pneumothorax is basically a combination of two words Pneumo(air) and Thorax(chest). Pneumothorax is also known as lung collapse. Pneumothorax is caused by an abnormal collection of air between the parietal and visceral pleura i. e. pleural space between the lungs and chest wall. Pneumothorax is a relatively common respiratory disease that can occur in a wide range of patients and in various clinical settings. The below figure of a normal and Pneumothorax affected lung can give you a little idea of what it is actually.

`Image Credit: https://www.firstaidforfree.com/what-is-a-spontaneous-pneumothorax/`

Symptoms of pneumothorax include sudden onset of sharp, one-sided chest pain and shortness of breath. Pneumothorax can be caused by a blunt chest injury, damage from underlying lung disease, or most horrifying — it may occur for no obvious reason at all. On some occasions, a collapsed lung can be a life-threatening event. Diagnosis of pneumothorax by physical examination alone can be difficult, particularly in smaller pneumothoraces. Usually, a chest X-ray, CT(Computed Tomography)-Scan, or ultrasound is used for detecting or confirming the presence of pneumothorax. Small pneumothorax can be typically resolved without treatment and requires only monitoring. This approach may be appropriate for people who have no underlying disease. In larger pneumothorax or if there is shortness of breath air may be removed by a syringe or chest tube connected to a one-way valve system. Occasionally, surgery may be required if tube drainage is unsuccessful. About 17–23 cases of pneumothorax occur per 100,000 people per year. They are more common in men than in women.
Diagnosing a pneumothorax in a chest radiography image is not difficult for an experienced physician or radiologist, but in some cases, it can easily be missed. Usually, it is diagnosed by a radiologist on a chest x-ray, and can sometimes be very difficult to confirm as discussed above. An accurate AI algorithm to detect pneumothorax would be useful in a lot of clinical scenarios. AI could be used to triage chest radiographs for priority interpretation or to provide a more confident diagnosis for non-radiologists. In other words, a machine learning-based pneumothorax diagnosis technique from the chest X-ray image is required to assist a physician to diagnose a pneumothorax.

Source: This problem belongs to one of the competitions held on Kaggle, which can be found on the following link :

SIIM-ACR Pneumothorax Segmentation

Identify Pneumothorax disease in chest x-rays

www.kaggle.com

1.2 Business Objectives:

We have to predict the pneumothorax and segmentize it based on the X-ray images
Time taken for prediction should be from few seconds to few minutes.

2. Deep Learning problem:

In the above section, we have seen what a pneumothorax is and how it gets diagnosed, So the next step is to formulate the problem as a Deep Learning problem. For that first, we will check what kind of data we have.

2.1 Data:

We have two CSV files one for the training set and one for the testing set. The training CSV file contains the image(X-ray) IDs and their corresponding RLE masks and the testing CSV file only contains the image(X-ray) IDs.

The data is comprised of images in DICOM format and annotations in the form of image IDs and run-length-encoded (RLE) masks. Some of the images contain instances of pneumothorax (collapsed lung), which are indicated by encoded binary masks in the annotations. Some training images have multiple annotations.

X-ray ID and it’s corresponding RLE encoded mask

Images without pneumothorax have a mask value of -1. This means a blank mask.

X-ray ID and it’s corresponding RLE encoded mask

This dataset can be found here:

SIIM_TRAIN_TEST

Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data…

www.kaggle.com

2.2 Mapping the real-world problem to a Deep Learning Problem:

2.2.1 Type of Deep Learning Problem:

As we have seen above in the dataset section we have a dataset in the form of images, and our task is to predict the mask of pneumothorax in the X-ray image. This problem is of Semantic Image Segmentation problem. This model will assist a physician to diagnose a Pneumothorax.

For solving this we have to use Deep Learning techniques which are used for non-structural data like Audio and video files, images. In this particular case, we have data in the form of X-ray images which is one of the unstructured data.

I will give a little introduction about Image Segmentation,

Basically, Image segmentation is a task where we classify pixel values of images belonging to a particular object class. So based on the way of classifying these pixels there are broadly two types of Segmentation, Semantic segmentation, and Instance segmentation. Consider the below images:

Semantic segmentation :

In this technique, all the pixels of a similar type are segmented with the same color as we can see in the above image. It detects that there are persons in pink shade and background in the black color.

Instance segmentation:

This technique segments all the similar objects or pixels in a different color, we can see that in the above image each person is represented by a different color.

As discussed earlier our problem falls under the Semantic Segmentation category where we have to label a pixel either a mask or a non-mask(background).

2.2.2 Evaluation metric:

Now we know this problem is a semantic segmentation problem. We have to define a metric for the evaluation of our Deep Learning model. There is a more commonly used and better metric for the evaluation of segmentation models called the Dice Coefficient.

Dice coefficient:

The dice coefficient originates from Sørensen–Dice coefficient, which is a statistic developed in the 1940s to gauge the similarity between two samples. It was brought to the computer vision community by Milletari et al. in 2016 for 3D medical image segmentation. Dice Loss is also known as the F1 score metric. In a simple manner, the Dice coefficient is 2 * the Area of Overlap divided by the total number of pixels in both images. Dice loss ranges from 0 to 1, with 1 signifying the greatest similarity between predicted and truth.

`Dice Coefficient = (2 * Area of Overlap)/(total pixels combined)`

Loss Metric:

Whenever we solve a machine learning problem or a deep learning problem, we need a nice trustworthy loss function to check whether our model is performing better or not. Loss functions are basically chosen on the basis of the type of dataset and the problem we solve. Our problem is Image Segmentation, for evaluation of Segmentation Models researchers have found that Binary Cross-Entropy and the Dice Loss is the best combinational loss function we can use. This combo loss is very helpful for problems with imbalanced datasets. So I will be using this combo loss as a loss metric in the whole case study. Combined the two methods allow for some diversity in the loss while benefitting from the stability of BCE.

3. Existing Approaches:

3.1 4th place solution:

https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion /108397

This is the 4th place solution in this competition. This is based on U-Net with a deep supervision branch for empty mask classification. He uses the U-Net model and ResNet34 as backbone network with frozen batch-normalization. He used some augmentation techniques like ShiftScaleRotate, RandomBrightnessContrast, ElasticTransform, HorizontalFlip from albumentations library. The optimizer used was Adam and a batch size of 8. For fast convergence, the proportion of non-empty samples linearly decreased from 0.8 to 0.22 depending on the epoch. His final ensemble was an average of 4 best checkpoints over 8 folds.

3.2 Unet Xception Keras for Pneumothorax Segmentation:

https://www.kaggle.com/meaninglesslives/unet-xception-keras-for-pneumo thorax-segmentation

This kernel is shared by Siddhartha. And this is based on the pre-trained imagenet Xception model with ResNet decoder. He has used cosine annealing and Stochastic Weight Averaging to converge to a better optima. He says in this kernel that the model’s performance can definitely be improved by using some other tricks, one obvious way is to use KFold Cross-Validation. Augmentations used in this kernel were ElasticTransform, GridDistortion. And the image size used was 256x256. He used U-Net architecture with Xception as the backbone network and he calls it the Uxception model.

4. My first cut approach:

I will be using the following steps so as to start working with the case study:

Data Collection:

For any problem, we want to solve using MachineLearning or Deep Learning algorithms we need a sufficient amount of data. We may sometimes struggle while collecting the data from various sources or sometimes it is not a tedious task. Thankfully the competition organizer has provided the data for us, we just have to download it and start working. So I will collect the data using wget chrome extension for downloading it fast from here:

Full Dataset

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

Note: We can also download the dataset using Kaggle API. Click here to know-how.

2. Preprocessing:

As we are provided the data with the .dcm format which is a DICOM image format, I will have to extract the images from .dcm format and make the data suitable for training the Deep Learning model. Also in this step I will be working on the creation of segmentation masks for all the available training data. Mask data is given in the form of Run Length Encoding, so I will have to generate the masks using that encoding. For this purpose, Organizer has provided the function, so I will be using that.

3. EDA(Exploratory Data Analysis):

Actually, we have the image data so we will not be able to do much EDA as we do in any other problems having many features to compare. But the good thing is we have Dicom files which have some metadata about the images. This metadata may not be totally useful while training a model but we can get some insights from it so as to understand the data. I will try to perform EDA on the available metadata and also the mask information available to us.

4. Model Development:

Now I will be having some insights from the data and the masks generated in the preprocessing step. Here I will be using the Deep Learning technique for solving the problem. This is an image segmentation problem and there are many deep learning algorithms available for performing the segmentation task. I will use the most popular vanilla UNet architecture as a baseline model for this task. Based on the results of this model I will try different architecture so as to achieve reliable model performance.

5. Exploratory Data Analysis:

5.1 Let’s analyze the training data provided to us,

train_data = pd.read_csv('siim/train-rle.csv', delimiter=',')
train_data.head()

train_data.info()

In the above information, I can see we have a total of 12954 X-ray files as training data

We have two columns in the dataset:

ImageId = ID of X-rays for each patient checked
EncodedPixels = Run Length Encoded Pixel data for each X-ray image

Let’s check if there are any duplicate ImageId in the dataset.

# add column if the file is duplicate or not
train_data['isDuplicate'] = train_data['ImageId'].duplicated()
train_data.head()

# check where the files are duplicate
dupImages = train_data.index[train_data['isDuplicate']==True]
print(f"We have total {len(dupImages)} duplicate image ids")Output:We have total 907 duplicate image ids

We should always drop duplicate files,

print(f"With duplicates we have total {len(train_data)} files.")
train_data = train_data.drop(list(dupImages))
print(f"Without duplicates we have total {len(train_data)} files.")Output:With duplicates we have total 12954 files.
Without duplicates we have total 12047 files.

Now I have dropped the duplicate ImageIds now I have to add a path for each Image Id for further processing of X-ray images,

train_data = train_data.drop('isDuplicate', axis=1)
train_data['ImagePath'] = 'siim/train_dicom/'+ train_data['ImageId']+'.dcm'
# save the .csv file for further use
train_data.to_csv('train_images_dicom.csv', index=False)
train_data.head()

5.2 Let’s analyze testing data provided to us,

test_data = pd.read_csv('siim/stage_2_sample_submission.csv', delimiter=',')
test_data = test_data.drop('EncodedPixels', axis=1)
test_data['ImagePath'] = 'siim/test_dicom/'+ test_data['ImageId']+'.dcm'
# save the .csv file for further use
test_data.to_csv('test_images_dicom.csv', index=False)
test_data.head()

test_data.info()

In the above information, I can see we have a total of 3205 X-ray files as testing data
We have one column in the dataset:
ImageId = ID of X-rays for each patient checked

The second column for the image path is added by me.

5.3 Analysis of metadata,

As discussed in the earlier part of this blog, we have images in the DICOM file format. So what is DICOM?

DICOM(Digital Imaging and Communications in Medicine) is nothing but another format for storing images just like .png and .jpeg, the only difference is with .dcm format we can store the metadata of the image into this along with the image. This format is commonly used in the medical imaging field. Now, almost all forms of medical imaging have become digitized and the spectrum of radiology includes not just digital radiographs but also CT scans, MRIs, ultrasound, and nuclear imaging. DICOM is the file format used for storing the images which can be X-ray scan, a CT-scan, etc. along with the metadata.

Let’s start analyzing DICOM files, we have a great library in python to work around DICOM files i.e. ‘pydicom’

we can use the following line of code to install the ‘pydicom’ library,

pip install pydicom

The metadata available with given images is too large, it is having a lot of information. Not all information is useful for us so we will analyze some of the important data from the file like patient age, sex, etc.

# Check the total no. of males and females in the dataset
mens = len(meta_data[meta_data["Gender"] == "M"])
women = meta_data.shape[0] - mens
print(f"We have total {mens} Males, and total {women} Females in the DataSet.")Output:We have total 6626 Males, and total 5421 Females in the DataSet.#Check the number of pneumothorax affected people and healthy
healthy = len(meta_data[meta_data["Affection"] == "No"])
ill = len(meta_data) - healthy
print(f"We have total {healthy} healthy patients, and {ill} pneumothorax affected patients")Output:We have total 9378 healthy patients, and 2669 pneumothorax affected patients

Let' visualize the above-found information in the form of Pie charts,

From the above Pie chart, I can see 55% of the total patients are Males and 45% are Females.

This Piec chart shows that 78% of patients are Safe, they do not have pneumothorax and only 22% of the people are affected by Pneumothorax.

So above is the Pie chart for distribution of pneumothorax with respect to patients Gender, We can see the distribution for Males and Females is nearly the same.
77.5% of males are healthy and 22.5% of males are affected by Pneumothorax.
78.2% of females are healthy and 21.8% of females are affected by Pneumothorax.

It is evident from the above chart that the percentage of affected Males and Females is nearly the same, So now just plot age histogram for checking the age-wise distribution of patients

Observations:

First of all the overall distribution of age looks almost normally distributed but not exactly.
0–6 years babies are not affected by Pneumothorax in this dataset.
From the age of 7–85 years, there is at least one patient which is affected.
Most of the affected patients are of age 51 years.
But we cannot say a particular age group is Affected because we have much variance i.e. patients of almost all ages above 6 years are affected.

6. Data Preprocessing:

We have the files in the form of .dcm files, we cannot use them directly for training the model. So we have to convert them into .png format. Also, I have to create masks for respective images which also will be in .png format.

So let’s start,

6.1 DCM to PNG conversion:

By using the following function we can convert our .dcm files into .png.

6.2 Mask Creation:

We have data with masks are Run Length encoded, We have to understand what is this actually so the above video gives a clear idea about RLE. Run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs. Consider, for example, simple graphic images such as icons, line drawings, Conway’s Game of Life, and animations. It is not useful with files that don’t have many runs as it could greatly increase the file size.

RLE : a4b3c3

E.g.

Input: aaaabbbccc

RLE : a4b3c3

Now I have all the files in .png format, Next step is to create ground truth masks for each image in the training dataset. We have mask data in the form of Run Length Encoded Pixels so we have to convert them into .png images. Organizers have provided a function for creating masks using RLE to the pixels which is shown in below code snippet.

Below is a sample X-ray and it’s the corresponding mask I created using the above function,

Sample X-ray with a mask from provided data

Above is a sample X-ray and it’s the corresponding mask
In the original X-ray, it is difficult to recognize if there is a pneumothorax or not.
But the ground truths are provided to us and I have printed mask image for the X-ray which I have shown in a Red-colored patch, this patch is the presence of Pneumothorax.
The third image is an X-ray with its mask, I can see the exact location of the mask on the X-ray.

7. Deep Learning Models:

Note: The main objective of this study is not to achieve great accuracy or outperform any model, but the objective is to explore how the model behaves and performing the output analysis so as to understand the behavior of the model.

7.1 Vanilla U-Net:

As a first cut approach, I am using a vanilla U-Net structure which is described in this paper ‘U-Net: Convolutional Networks for Biomedical Image Segmentation’

Code used for this Vanilla U-Net model is lengthy so I am not including that here, you can find the whole code in the Github repository I have provided a link to.

Following are the results of using the Vanilla U-Net model,

From the above graphs, I can say that model for train and test sets is working fine but it is not improving the score after 0.36014.
This is maybe due to the U-Net network architecture is vanilla, and I should try something else than this vanilla structure

7.2 Double U-Net:

As its name suggest this is nothing but a combination of two U-Net models. This architecture is from the paper “DC-UNet: Rethinking the U-Net Architecture with Dual Channel Efficient CNN for Medical Images Segmentation”

A short explanation of architecture by the author:

DoubleU-Net starts with a VGG19 as encoder sub-network, which is followed by a decoder sub-network. In the network, the input image is fed to the modified UNet(UNet1), which generates predicted masks (i.e., output1). We then multiply the input image and the produced masks (i.e., output1), which acts as an input for the second modified U-Net(UNet2) that produces another the generated mask (output2). Finally, we concatenate both the masks (output1 and output2) to get the final predicted mask (output).

The above architecture provided by the author is good but as we can see there are multiple encoder and decoder blocks, it means a lot of training parameters, So due to the unavailability of strong resource to train I thought of changing the architecture a little by using 2 blocks instead of 4 blocks as shown in the below figure.

Not only removing blocks but also I have changed the image data format into the ‘channel first’ format, this is a tip provided by experts that channel first data format works better than channel last data format. The code for the architecture can be found on my GitHub repository.

By using the above architecture I got the following results:

After 30 epochs the model gave the dice coefficient of 0.25491, of course, it is not that good but as discussed earlier I am experimenting. I saved the best model with a 0.25491 dice coefficient.

After this, I did some analysis, for what type of images the model is giving a high score and for what type of images the model is giving less score. I calculated the dice score for each image in the dataset and found that the images which are having a large number of background pixels are the ones for which a model gave a high score, and this means that low score images are those which are having a small number of foreground pixels. This is a classical case of imbalanced datasets, as we are having a dataset where about 80% of the patients are healthy and 20% are the affected patients.

So, for fighting with the data imbalance issue I used the class weighting method and provided more weight to the foreground pixels(which is our interest to predict correctly) and less weight to the majority background pixels. By using this weighted metric I trained the previous model further for 3 epochs but found no improvement. Below are the predictions made by both models:

Predictions by a model with no weighted metric

Predictions by a model with weighted metric

These image plots show that the results are more or less similar to each other. There are many other ways to handle the data imbalance problem than the class weighting method, unfortunately, due to some time constraints I trained this model only for 30 epochs and also did not try other methods to tackle data imbalance. But the score can be definitely be improved by training the model for 300–1000 epochs and along with some other methods like oversampling the minority class data and also the augmentation techniques will also help to improve the score.

8. Final Pipeline:

So, now I am using the saved model and creating a final pipeline that we can think of deploying.

# load the saved model model.load_weights('/content/drive/MyDrive/27_Case_study_2/best_Double_Unet.hdf5')

We can use the above function as follows predicting the pneumothorax,

9. Deployment:

I have deployed the above-explained model using Flask API, following is the demo video of how the deployed web application works,

Pneumothorax Detector Demo Web Application

10. Future Extensions:

We can use different methods available for handling data imbalance, better augmentation techniques, etc.
Unet++ architecture can be used for solving this deep learning problem.