Research Article Open Access

Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields

K. P. Pallavi1, L. Sobha2 and M. M. Ramya1
  • 1 Hindustan Institute of Technology and Science, India
  • 2 AUKBC, India

Abstract

Named Entities (NEs) that exist in the sentences are essential to build Natural Language Processing (NLP) applications for Information Extraction (IE) from large corpora. However, generating a large corpus is challenging for resource poor languages, such as Kannada. Further, there is no annotated corpus available online. The challenges faced in annotating NEs with pre-defined classes are: It is morphologically joined with other words and the spelling variations are more frequent for Kannada words. Sentence structure varies according to morphology, parts of speech (pos) and chunking of a language. These parameters differ from one language to another. To address these challenges, a novel application system is proposed to identify NEs in Kannada using a large corpus of 73,676 tokens. The Named Entity Recognition (NER) system consist of a robust pos tagger and Noun Phrase (NP) chunker developed for generic data. Five gazetteer lists were created from many orthographic patterns for each word. Context information such as previous two words, next two words, word morphology and gazetteer lists were added to feature lists. An unigram-bigram template was designed and incorporated into Conditional Random Fields (CRFs) to generate conditional feature functions. The proposed system resulted in 86.85% and 71.01% f-measure for gold test data and newspaper data respectively.

Journal of Computer Science
Volume 14 No. 5, 2018, 645-653

DOI: https://doi.org/10.3844/jcssp.2018.645.653

Submitted On: 3 October 2017 Published On: 24 February 2018

How to Cite: Pallavi, K. P., Sobha, L. & Ramya, M. M. (2018). Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields. Journal of Computer Science, 14(5), 645-653. https://doi.org/10.3844/jcssp.2018.645.653

  • 4,571 Views
  • 2,505 Downloads
  • 9 Citations

Download

Keywords

  • Named Entities
  • Natural Language Processing
  • Noun Phrase Chunker
  • Conditional Random Fields