Project
In the summer of 2023, I got the opportunity to work for University Health Network consulting with physicians to leverage artificial intelligence in precision oncology. In addition to my role at UHN, I was a member of the Data Science Institute at the University of Toronto this summer. We recommended and built a prototype of a deep learning tool, SlideSleuth, that uses an unsupervised deep learning model to analyze slide images of lung adenocarcinoma (LUAD). LUAD is a challenging form of cancer to diagnose due to the complexity of the slide images produced from a clinical setting. Our goal was to extract relevant features from the slide images to aid expert physicians in their diagnoses.
Our model was a convolutional variational autoencoder built in Python and Tensorflow. The goal of the model is to reduce our images to a smaller feature vector representation, effectively embedding the image. We then further reduce the dimension of the feature vectors via clustering. We then aim to map feature vector dimensions to image attributes, thus providing physicians with interpretability of our model. Our private dataset consists of 158 slide images from 106 patients at UHN, with each slide image being cropped into smaller tiles, with over 350,000 tiles total. The data processing and data pipelining was done in Python and R, and data visualization was done in R. At the end of the project, we confirmed that our model was recognizing and extracting salient features of images. Next steps for this project include developing further consultation with physicians to improve model interpretation.
Technologies
Python
Tensorflow
Pandas
numPy
R
Back