Reputation: 347
I would like to find a good way of identifying names of people, places, etc. within users search queries on my site. For example, if a user asks "how old is George Washington", I need to be able to know from a predefined list that George Washington is a person.
Some of the lists will be global, and some will be user specific. For example, if they asked "how old is John Smith" I may only want to identify the particular John Smith that is my associate--and I wouldn't want to identify him as a person if he's not my associate.
Is there any NLP library or crawling of these lists I could do to leverage Soundx, mature NLP, misspell, etc. functionality? I can write it by hand, but I would rather leverage something mature. Thanks.
Upvotes: 3
Views: 5278
Reputation: 122122
The particular Natural Language Processing (NLP) task that you're looking for is called Named Entity Recognition
(NER)
Other than the Stanford's CRF-NER (in java), a popular python choice from Natural Language ToolKit
(NLTK) is often used as a baseline for NER tasks.
You can try installing NLTK then execute the following code:
>>> from nltk.tokenize import word_tokenize
>>> from nltk.tag import pos_tag
>>> from nltk.chunk import ne_chunk
>>> sentence = "How old is John Smith?"
>>> ne_chunk(pos_tag(word_tokenize(sentence)))
Tree('S', [('How', 'WRB'), ('old', 'JJ'), ('is', 'VBZ'), Tree('PERSON', [('John', 'NNP'), ('Smith', 'NNP')]), ('?', '.')])
Upvotes: 2
Reputation: 15432
What you need is called Named Entity Recognition
One of the best available software to do it comes with Stanford NLP: http://nlp.stanford.edu/software/CRF-NER.shtml (written in Java)
If you are on another platform, there are good open source projects in Ruby and Python. Search for "Named Entity Recognition".
Upvotes: 4