Reputation: 61
We are having some issues with the Google NLP service. The service is intermittently refusing to return entities for certain terms. We use the NLP annotate API for free text answers to survey responses. A recent question was related to an image of a kids TV character in the UK called Zippy. Some example responses are below. Unfortunately we had thousands of responses like this and none of them detected "zippy" as an entity. Strangely "elmo", "zippie" and others were detected without any issue, only this specific set of chars ("zippy") returned with no entities. Any ideas why this might be?
{
"sentences": [{
"text": {
"content": "zippy",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0.1,
"score": 0.1
}
}],
"tokens": [],
"entities": [],
"documentSentiment": {
"magnitude": 0.1,
"score": 0.1
},
"language": "en",
"categories": []
}
"rainbow" detected but not "zippy"
{
"sentences": [{
"text": {
"content": "zippy from rainbow",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0.1,
"score": 0.1
}
}],
"tokens": [],
"entities": [{
"name": "rainbow",
"type": "OTHER",
"metadata": [],
"salience": 1,
"mentions": [{
"text": {
"content": "rainbow",
"beginOffset": 11
},
"type": "COMMON"
}]
}],
"documentSentiment": {
"magnitude": 0.1,
"score": 0.1
},
"language": "en",
"categories": []
}
"zippie" detected fine
{
"sentences": [{
"text": {
"content": "zippie",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0,
"score": 0
}
}],
"tokens": [],
"entities": [{
"name": "zippie",
"type": "OTHER",
"metadata": [],
"salience": 1,
"mentions": [{
"text": {
"content": "zippie",
"beginOffset": 0
},
"type": "PROPER"
}]
}],
"documentSentiment": {
"magnitude": 0,
"score": 0
},
"language": "en",
"categories": []
}
"elmo" detected fine
{
"sentences": [{
"text": {
"content": "elmo",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0.1,
"score": 0.1
}
}],
"tokens": [],
"entities": [{
"name": "elmo",
"type": "OTHER",
"metadata": [],
"salience": 1,
"mentions": [{
"text": {
"content": "elmo",
"beginOffset": 0
},
"type": "COMMON"
}]
}],
"documentSentiment": {
"magnitude": 0.1,
"score": 0.1
},
"language": "en",
"categories": []
}
Upvotes: 0
Views: 172
Reputation: 1508
Services like these are trained on a specific corpus of 'entity' values.
The service tokenizes/chunks, then uses part of speech tagging to identify noun phrases and checks against a giant index to see if that noun phrase is an entity.
Zippy must not be in the corpus. Not sure about google NLP, but Watson NLU comes with a GUI product for easily creating your own 'dictionary' of entity noun phrases.
Also very possible to create your own using NLTK or from scratch in python, but all require the effort of manually curating your own 'dictionary', unless you are able to get your hands on and adapt another.
Upvotes: 2