Ohnishi Lab. |
Natural Language Processing Research Theme |
Estimating Parts of Speech and Meanings of Unknown Words |
The existence of unknown words, not registered in a dictionary, causes a break on inaccurate result in natural language processing. For the practical system, it is necessary to suppose the existence of unknown words. But it is difficult to infer meanings of unknown words, because English words usually have many parts of speech and meanings. We have studied on estimation of meaning of an unknown word in order to construct a robust natural language processing system. First, we have investigated the occurrence rate of unknown words and classified the words using "Lob Corpus", a comparatively large corpus and electronic dictionary. Second, based on the results we have obtained effective clues for inferring meanings in each level of morpheme, phrase and sentence. And, we have examined of unknown words, which seems important in inferring. |