Projet de fin d'étude : Topological Machine Learning

Etudiant : BARBARA ABIR

Filière : Master Mathématiques Appliquées et Science des données (MASD)

Encadrant : Pr. YAHYAOUYI ALI

Annèe : 2022

Résumé : Topological Machine Learning (TML) is a current field that combines the two disciplines TDA “Topological Data Analysis” and machine learning. TDA is an application of algebraic topology on data with the purpose of extracting the information hidden in the shape of data, which is the topology or struc- ture of the points’ cloud. There are two approaches of using TDA in machine learning, the first one is to apply TDA on the input dataset in order to en- hance machine learning models outcomes, and the second is to apply it to the architecture of a trained deep neural network (DNN) in order to investigate the model’s learnability. Deep Neural Networks (DNNs) tackle many complicated issues without the requirement for a specific algorithm with instructions, and without even having to solve the problem before handing it over to the computer. It is the the DNN that identifies patterns, correlations, and relations in data (ex- amples), and the model’s output is the solved issue, which is revolutionary compared to the programming mode with precise instructions. However, the difficulty now is that we do not know what is going on inside these DNNs, which is why they are titled "Black Boxes," and this lack of interpretability and explainability causes dozens of new issues, such as in the medical field -which is our field of application- it raises the ethical constraint. When peo- ple’s lives are at stake, we can’t allow a system to do what it wants and make decisions without having the interpretability of these decisions "the why of things!", specially it has previously been demonstrated that deep neural net- works might make biased decisions due to a lack of knowledge, an imbalance of data, or other factors, for example training an HR recruiting model on a dataset that is not balanced for gender might result in gender discriminatory predictions (e.g. wrongly recrute a man even if he is not as qualified as a woman just because more males are in the training set). Furthermore, a neu- ral network can easily overfit, which occurs when the model adapts a lot on the training data and then gives great results on this database but the quality of generalization is very low, which we can’t even discover if the test sets are very close to the training set or if we have a lack of labeled data, which is the case with medical data. Although topological features of the network have been demonstrated to be efficient in explaining and interpreting DNNs, therefore in this research study we construct a method using topological data analysis approaches that enable us to handle the challenges outlined above.