Reputation: 3393
I use Stanford CoreNlp for Names Entity Recognition (NER). I've noticed that in some cases that it's not 100% which is fine and not surprising. However, even if a, say, single-word named entity is not recognized (i.e., the label is O
), it has the tag NNP
(proper noun).
For example, given the example sentence "The RestautantName in New York is the best outlet.", nerTags()
yields [O, O, O, LOCATION, LOCATION, O, O, O, O, O]
only correctly recognizing "New York". The parse tree for this sentence looks like
(ROOT
(S
(NP
(NP (DT The) (NNP RestautantName))
(PP (IN in)
(NP (NNP New) (NNP York))))
(VP (VBZ is)
(NP (DT the) (JJS best) (NN outlet)))
(. .)))
so "RestaurantName" is a proper noun (NNP
)
When I look up the definition of a proper noun, it sounds very close to a named entity. What's the difference?
Upvotes: 1
Views: 1016
Reputation: 750
Named Entities is a concept that has been invented in the 90's for Information Retrieval / Extraction purposes. More precisely, it considers "names of interrest" in a text for applications, e.g. search engines.
You may read the corresponding Wikipedia page
In brief many named entities are not proper nouns: dates, amounts, collective entities, etc. Conversely, you may find proper nouns that are not named entities, but this is rather rare and dependent on the application. For instance, language names (English, French, Spanish) are considered proper nouns but may not be named entities. Same thing for History, Humankind, Universe.
So the NLP software has to decide for each proper noun if it is an entity and what is its type, and this is not trivial.
Theoretically, the definition of named entities relies on a determined reference that bind the name to an object, whether concrete or abstract. This leads to semiotics and philisophical consideration so I won't elaborate more but you may find many articles and books discussing this notion and how it is implemented in softwares.
Upvotes: 2
Reputation: 8739
The parser is trained on parse treebank data and the named entity recognizer is trained on separate named entity data for PERSON, LOCATION, ORGANIZATION, MISC.
I would've thought that RestaurantName might get marked as MISC, but if it's not getting tagged it means that there are not really examples like that in the training data for named entities. The key point here is that the parse decisions and named entity decisions are made completely independently of each other by separate models trained on separate data.
Upvotes: 2