Transformer-Based Natural Language Processing Models for Mining Unstructured Oncology Clinical Notes to Improve Drug Matching

Authors

  • Salvation Ifechukwude Atalor Department of Computer Science, Prairie View A&M University, Prairie View, Texas, United States Author
  • Agama Omachi Department of Economics, University of Ibadan, Ibadan, Nigeria Author

DOI:

https://doi.org/10.32628/IJSRSET25122197

Keywords:

Transformer Models, Natural Language Processing, Oncology, Clinical Notes and Drug Matching

Abstract

Transformer-based Natural Language Processing (NLP) models have revolutionized the extraction of insights from unstructured clinical text, offering significant advancements in precision medicine. This review explores the application of these models in mining oncology clinical notes to enhance drug matching and personalized treatment strategies. Oncology clinical documentation, often characterized by high variability and complexity, poses challenges to traditional data processing methods. However, transformer architectures such as BERT, GPT, and their domain-specific variants have demonstrated exceptional capabilities in understanding context, semantics, and clinical terminologies. We review recent literature highlighting the use of these models in identifying relevant patient characteristics, treatment histories, and biomarkers that influence therapeutic decisions. Special attention is given to the integration of these models into electronic health record (EHR) systems and their role in improving drug recommendation systems. Additionally, we address current limitations, including model interpretability, data privacy, and generalizability across diverse patient populations. The review concludes by outlining future directions for research, emphasizing the potential of transformer-based NLP in driving more accurate and efficient drug matching in oncology care through better utilization of clinical narratives.

Downloads

Download data is not yet available.

References

Alsentzer, E., Murphy, J. R., Boag, W., Weng, W. H., Jin, D., Naumann, T., & McDermott, M. (2019). Publicly available clinical BERT embeddings. Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78. https://doi.org/10.18653/v1/W19-1909

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P. & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://doi.org/10.48550/arXiv.2005.14165

Chapman, W. W., Nadkarni, P. M., Hirschman, L., D’Avolio, L. W., Savova, G. K., & Uzuner, Ö. (2011). Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association, 18(5), 540–543. https://doi.org/10.1136/amiajnl-2011-000465

Demner-Fushman, D., Chapman, W. W., & McDonald, C. J. (2009). What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, 42(5), 760–772. https://doi.org/10.1016/j.jbi.2009.08.007

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.48550/arXiv.1810.04805

El Emam, K., Rodgers, S., & Malin, B. (2011). Anonymising and sharing individual patient data. BMJ, 350, h1139. https://doi.org/10.1136/bmj.h1139

Enyejo, J. O., Adeyemi, A. F., Olola, T. M., Igba, E & Obani, O. Q. (2024). Resilience in supply chains: How technology is helping USA companies navigate disruptions. Magna Scientia Advanced Research and Reviews, 2024, 11(02), 261–277. https://doi.org/10.30574/msarr.2024.11.2.0129

Enyejo, L. A., Adewoye, M. B. & Ugochukwu, U. N. (2024). Interpreting Federated Learning (FL) Models on Edge Devices by Enhancing Model Explainability with Computational Geometry and Advanced Database Architectures. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. Vol. 10 No. 6 (2024): November-December doi : https://doi.org/10.32628/CSEIT24106185

Huang, K., Altosaar, J., & Ranganath, R. (2020). ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv preprint arXiv:1904.05342. https://doi.org/10.48550/arXiv.1904.05342

Idoko, D. O. Adegbaju, M. M., Nduka, I., Okereke, E. K., Agaba, J. A., & Ijiga, A. C . (2024). Enhancing early detection of pancreatic cancer by integrating AI with advanced imaging techniques. Magna Scientia Advanced Biology and Pharmacy, 2024, 12(02), 051–083. https://magnascientiapub.com/journals/msabp/sites/default/files/MSABP-2024-0044.pdf

Igba E., Ihimoyan, M. K., Awotinwo, B., & Apampa, A. K. (2024). Integrating BERT, GPT, Prophet Algorithm, and Finance Investment Strategies for Enhanced Predictive Modeling and Trend Analysis in Blockchain Technology. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., November-December-2024, 10 (6) : 1620-1645.https://doi.org/10.32628/CSEIT241061214

Ijiga, A. C., Balogun, T. K., Sariki, A. M., Klu, E. Ahmadu, E. O., & Olola, T. M. (2024). Investigating the Influence of Domestic and International Factors on Youth Mental Health and Suicide Prevention in Societies at Risk of Autocratization. NOV 2024 | IRE Journals | Volume 8 Issue 5 | ISSN: 2456-8880.

Ijiga, A. C., Igbede, M. A., Ukaegbu, C., Olatunde, T. I., Olajide, F. I. & Enyejo, L. A. (2024). Precision healthcare analytics: Integrating ML for automated image interpretation, disease detection, and prognosis prediction. World Journal of Biology Pharmacy and Health Sciences, 2024, 18(01), 336–354. https://wjbphs.com/sites/default/files/WJBPHS-2024-0214.pdf

Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., ... & Harutyunyan, S. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035. https://doi.org/10.1038/sdata.2016.35

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682

Liu, F., Shareghi, E., Meng, Y., Basaldella, M., & Collier, N. (2021). Self-alignment pretraining for biomedical entity representations. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, 4228–4238. https://doi.org/10.18653/v1/2021.naacl-main.334

Luo, Y., Sun, X., Yang, Q., Du, C., & Zhang, X. (2022). BioGPT: Generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), bbac409. https://doi.org/10.1093/bib/bbac409

Meystre, S. M., Friedlin, F. J., South, B. R., Shen, S., & Samore, M. H. (2010). Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Medical Research Methodology, 10, 70. https://doi.org/10.1186/1471-2288-10-70

Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of Medical Informatics, 17(01), 128–144.

Michael, C. I, Campbell, T. Idoko, I. P., Bemologi, O. U., Anyebe, A. P., & Odeh, I. I. (2024). Enhancing Cybersecurity Protocols in Financial Networks through Reinforcement Learning. International Journal of Scientific Research and Modern Technology (IJSRMT). Vol 3, Issue 9, 2024. Doi:- 10.38124/ijsrmt.v3i9.58

Murff, H. J., FitzHenry, F., Matheny, M. E., Gentry, N., Kotter, K. L., Crimin, K., ... & Dittus, R. S. (2011). Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA, 306(8), 848–855. https://doi.org/10.1001/jama.2011.1204

Peng, Y., Yan, S., & Lu, Z. (2019). Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. Proceedings of the 18th BioNLP Workshop and Shared Task, 58–65. https://doi.org/10.18653/v1/W19-5006

Pustejovsky, J., & Stubbs, A. (2012). Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. " O'Reilly Media, Inc."..

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Technical Report. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1–67. https://jmlr.org/papers/v21/20-074.html

Rasmy, L., Xiang, Y., Xie, Z., Tao, C., Zhi, D., & Xu, H. (2023). MedT5: Generative pretrained transformers for medical text generation and classification. NPJ Digital Medicine, 6(1), 52. https://doi.org/10.1038/s41746-023-00785-1

Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507–513. https://doi.org/10.1136/jamia.2009.001560

Si, Y., Wang, J., Xu, H., & Roberts, K. (2021). Enhancing Clinical Concept Extraction with Contextual Embeddings. Journal of the American Medical Informatics Association, 28(9), 1932–1941. https://doi.org/10.1093/jamia/ocab124

Uzuner, Ö., Solti, I., & Baugh, L. (2018). 2018 n2c2 shared task on clinical text analysis: Overview and evaluation results. Journal of the American Medical Informatics Association, 25(9), 1187–1198. https://doi.org/10.1093/jamia/ocy062

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.03762

Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., ... & Liu, H. (2018). Clinical information extraction applications: a literature review. Journal of Biomedical Informatics, 77, 34–49. https://doi.org/10.1016/j.jbi.2017.11.011

Yang, X., Lyu, T., Rasmy, L., Xu, H., & Zhi, D. (2022). GatorTron: A large language model for clinical natural language processing. NPJ Digital Medicine, 5(1), 194. https://doi.org/10.1038/s41746-022-00791-w

Zhang, Y., Jin, Q., & Xu, H. (2023). OncoBERT: A transformer-based model for oncology clinical text mining. Journal of Biomedical Informatics, 140, 104325. https://doi.org/10.1016/j.jbi.2023.104325

Downloads

Published

26-04-2025

Issue

Section

Research Articles

How to Cite

[1]
Salvation Ifechukwude Atalor and Agama Omachi, “Transformer-Based Natural Language Processing Models for Mining Unstructured Oncology Clinical Notes to Improve Drug Matching”, Int J Sci Res Sci Eng Technol, vol. 12, no. 2, pp. 722–740, Apr. 2025, doi: 10.32628/IJSRSET25122197.

Similar Articles

1-10 of 116

You may also start an advanced similarity search for this article.