Improving Medical Document Classification via Feature Engineering
Document Classification (DC) is a task involving assigning predefined labels to unseen documents using a trained model. DC has gained significance in healthcare for applications like clinical risk factor categorization and disease classification. It aids in medical document management and can lead to cost reduction through improved decision-making. Medical documents have unique attributes, such as acronyms and abbreviations, making them challenging to analyze. The current performance in classifying medical documents is unsatisfactory, partly due to privacy concerns limiting data availability. The presentation proposes innovative approaches, including domain-specific feature extraction and ontology-based data augmentation, to enhance DC performance. Additionally, it introduces dictionary-based data oversampling methods to address imbalanced datasets, yielding improved classification results.
A short bio
Mahdi Abdollahi holds a B.Sc. in software engineering and an M.Sc. in Computer Science from Iran. He served as a lecturer at Shahid Madani University of Azarbaijan for three years. He got his PhD in computer science from Victoria University of Wellington, New Zealand. Currently, he is working as a data scientist in the financial crime area for the Bank of New Zealand (BNZ). His research centers on artificial intelligence, including text mining, evolutionary computation, global optimization techniques, and their applications. His extensive research covers areas like feature selection, multi-objective optimization, computational and swarm intelligence, data analysis, natural language processing, and machine learning. Additionally, he serves as a regular reviewer for prominent journals in the field.