Jeff
Jeff

Reputation: 347

Identify names in a string

I would like to find a good way of identifying names of people, places, etc. within users search queries on my site. For example, if a user asks "how old is George Washington", I need to be able to know from a predefined list that George Washington is a person.

Some of the lists will be global, and some will be user specific. For example, if they asked "how old is John Smith" I may only want to identify the particular John Smith that is my associate--and I wouldn't want to identify him as a person if he's not my associate.

Is there any NLP library or crawling of these lists I could do to leverage Soundx, mature NLP, misspell, etc. functionality? I can write it by hand, but I would rather leverage something mature. Thanks.

Upvotes: 3

Views: 5278

Answers (2)

alvas
alvas

Reputation: 122122

The particular Natural Language Processing (NLP) task that you're looking for is called Named Entity Recognition (NER)

Other than the Stanford's CRF-NER (in java), a popular python choice from Natural Language ToolKit (NLTK) is often used as a baseline for NER tasks.

You can try installing NLTK then execute the following code:

>>> from nltk.tokenize import word_tokenize
>>> from nltk.tag import pos_tag
>>> from nltk.chunk import ne_chunk
>>> sentence = "How old is John Smith?"
>>> ne_chunk(pos_tag(word_tokenize(sentence)))
Tree('S', [('How', 'WRB'), ('old', 'JJ'), ('is', 'VBZ'), Tree('PERSON', [('John', 'NNP'), ('Smith', 'NNP')]), ('?', '.')])

Upvotes: 2

Blacksad
Blacksad

Reputation: 15432

What you need is called Named Entity Recognition

One of the best available software to do it comes with Stanford NLP: http://nlp.stanford.edu/software/CRF-NER.shtml (written in Java)

If you are on another platform, there are good open source projects in Ruby and Python. Search for "Named Entity Recognition".

Upvotes: 4

Related Questions