Exploring the transferability of cancer prediction models across races
Cancer is a foremost cause of death worldwide. The researchers perceive the incidence and prognosis of cancers to be related to controllable aspects like lifestyle and environment, and uncontrollable factors like race and genes. This ensemble of factors contributes to the murkiness of understanding the incidence as well as prognosis of the disease. Genes (as well as races) emerge as a significant factor in the understanding of the disease; the prevalence of cancer varies across races. For example, breast cancer prevalence rate is highest among White and Black non-Hispanic women and the lowest among Hispanic women.
The Machine Learning (ML) community has been instrumental in understanding the association of various factors which play roles in the prevalence, prognosis, and incidence of this deadly disease. In extant ML-based works, homogeneous sources and targets are considered for model building and prediction of the disease. In this research, we would investigate transferability in cross-racial scenarios. The exploration will focus on if a model trained on cancer prevalence data from race X will show efficaciousness in predicting data from race Y. Since ML models show a certain extent of data bias, we expect that model trained on data X may not show considerable efficiency while dealing with data from race Y. If this assumption is proved, we would focus on designing schemes to mitigate the issue and restore the transferability of the models.