Call between 8 a.m. and 4 p.m.
Mail us for support
Laboratory address
Aleksandra Medvedeva 4
Niš, Serbia
Advancing healthcare through technology
Call between 8 a.m. and 4 p.m.
Mail us for support
Laboratory address
Avdić, Aldina; Marovac, Ulfeta; Janković, Dragan
Automated labeling of terms in medical reports in Serbian Journal Article
In: Turkish Journal of Electrical Engineering and Computer Sciences, vol. 28, no. 6, pp. 3285 – 3303, 2020, (All Open Access, Bronze Open Access).
Abstract | Links | BibTeX | Tags: Errors; Automated labeling; Automatic labeling; Electronic health; Medical dictionary; Medical domains; Natural languages; Processing errors; Supervised methods; Diagnosis
@article{Avdi\'{c}20203285,
title = {Automated labeling of terms in medical reports in Serbian},
author = {Aldina Avdi\'{c} and Ulfeta Marovac and Dragan Jankovi\'{c}},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85096654512\&doi=10.3906%2fELK-2002-9\&partnerID=40\&md5=b1b90c80b11480f73353083d67f53750},
doi = {10.3906/ELK-2002-9},
year = {2020},
date = {2020-01-01},
journal = {Turkish Journal of Electrical Engineering and Computer Sciences},
volume = {28},
number = {6},
pages = {3285 \textendash 3303},
publisher = {Turkiye Klinikleri},
abstract = {Nowadays, many electronic health reports (EHRs) are stored daily. They consist of the structured part and of an unstructured section written in natural language. Due to the limited time for medical examination, EHRs are short reports which often contain errors and abbreviations. Therefore it is a challenge to process an EHR and extract knowledge from this part of the text for different purposes. This paper compares the results of three proposed methods for automatic labeling of medical terms in unstructured parts of EHRs. All words are categorized as words within the medical domain (symptoms, diagnoses, therapies, anatomy, specialties etc.) and those beyond the medical domain (numbers, places, stop words etc.). The first method is based on dictionaries of medical terms, the second on the training set, and the third on the training set and rules. The results of application of different methodologies to reduce a word to its basic form (pure, prefix, stem) are given for each of the methods. The paper shows that in labeling medical terms, the methods based on medical dictionaries (diagnosis, symptoms, medications etc.) do not produce best results, therefore it is better to use manually annotated part of the data set as a model. A significant number of words (17.36%) in medical reports are abbreviations and errors, so for better results, we should focus on creating rules to solve this problem. Better results are obtained for supervised methods compared to the dictionary-based method (with relative improvement of 42.82%). The inclusion of the algorithm for processing errors and abbreviations increased the results (with a relative improvement of 4.21%) and gave the largest F1 measure (0.9082). The advantage of the proposed method is that the use of rules for processing errors and abbreviations provides good results regardless of how the word is reduced to its basic form. © T\"{U}B\.{I}TAK},
note = {All Open Access, Bronze Open Access},
keywords = {Errors; Automated labeling; Automatic labeling; Electronic health; Medical dictionary; Medical domains; Natural languages; Processing errors; Supervised methods; Diagnosis},
pubstate = {published},
tppubtype = {article}
}