Pdf Malware Detection: Toward Machine Learning Modelling with Explainability Analysis
Keywords:
PDF malware detection, machine learning, Random Forest, SVM, DNN, explainability, cybersecurity, malicious PDF, classification algorithms, Kaggle datasetAbstract
In the digital age, PDF files are widely used for document sharing, but their popularity also makes them a target for malware attacks. This project, titled "PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis," aims to develop and evaluate machine learning models for detecting malware in PDF files. Utilizing a dataset from Kaggle, which contains labeled examples of malicious and benign PDFs, various algorithms including Random Forest, C5.0, J48, Support Vector Machine (SVM), AdaBoost, Deep Neural Network (DNN), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN) will be applied. The primary focus is on achieving high detection accuracy while also providing explainability to understand the decision-making process of the models. By leveraging machine learning techniques, this project seeks to enhance cybersecurity measures, offering a robust solution to identify and mitigate potential threats embedded in PDF documents.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.