About

A PERFORMANCE COMPARISON OF MACHINE LEARNING ALGORITHMS IN THE DIAGNOSIS OF HEART DISEASE
by
Henri Prudhomme


Heart Disease is the leading cause of death in the world today. In the United States alone, approximately 610,000 people die of heart disease each year. There are various indicators of heart disease progression and allowing patients to become aware of these vulnerability can save lives. By publishing models on our website https://machineu1.github.io, this project can effectively demonstrate the power of machine learning and its use in the prediction of heart disease. The purpose of this research project was to learn various machine learning algorithms and implement them to predict the probability of heart disease from a dataset. The fourteen-variable dataset used in this project came from the UCI Machine Learning Heart Disease Repository. This repository provided over 350 data points which were split at a 70 to 30 ratio for training and testing of each of the models. The data points consisted of thirteen attributes and one outcome. The models were generated from the following implemented algorithms: The Random Forest Regression Algorithm (RFRA), the Multivariable Linear Regression Algorithm (MLRA), and the Random Forest Classification Algorithm (RFCA). The performance of each model was calculated by comparing the prediction of the machine algorithm to the actual prediction of the data point. A mean accuracy was determined by rebuilding 1000 models for each algorithm and averaging the resulting accuracies. The MLRA had a mean accuracy of 72.3%, the RFCA had a mean accuracy of 97.8%, and the RFRA had a mean accuracy of 100%. In conclusion, the RFRA and the RFCA models resulted in near-perfect prediction of the probability of heart disease. Future improvements could include using a larger number of unique algorithms as well as a larger training dataset.