연구논문

참여교수 / 김진웅 외, 「형태 분석을 위한 메신저 텍스트 처리 방안」

2020년 admin 23-01-20 179

김진웅 외, 「형태 분석을 위한 메신저 텍스트 처리 방안」, 텍스트언어학 49, 2020.12.


Abstract


This study presents a methodology that facilitates UGC morphological analysis by normalizing non-standard texts, and shows that the performance of existing NLP tools can be improved through it. The KakaoTalk messenger texts were morphologically analyzed using Utagger and their error rates were checked over a total of three steps. In order to increase the accuracy of morphological analysis, it is necessary to grasp the characteristics of the messenger data. In the case of messenger data, the spacing norms are not observed, and punctuation marks are often omitted. Variants that are written as they sound or whose letters are partially modified are used, and typos are also frequent. In addition, various abbreviations are used, new words and dialects appear, and new sentence final endings are frequently used. These linguistic features can cause an error of POS and should be considered in the process of normalization, The error rate was changed from 35.1% to 17.7% after due to wrong interword space detection, and the error rate was reduced by 12.2% due to spelling checking. The remaining errors were the case of misanalyzing the sentence final endings as a connecting endings due to the absence of punctuation marks, and errors related to new proper nouns, new words, and interjections. 

QUICK MENU