Reputation: 785
I'm trying to build a very simple NLP chat (I could even say pseudo-NLP?), where I want to identify a fixed subset of intentions (verbs, sentiments) and entities (products, etc)
It's a kind of entity identification or named-entity recognition, but I'm not sure I need a full fledged NER solution for what I want to achieve. I don't care if the person types cars instead of car. HE HAS to type the EXACT word. So no need to deal with language stuff here.
It doesn't need to identity and classify the words, I'm just looking for a way that when I search a phrase, it returns all results that contains each word of if.
I want to index something like:
want [type: intent]
buy [type: intent]
computer [type: entity]
car [type: entity]
Then the user will type:
I want to buy a car.
Then I send this phrase to ElasticSearch/Solr/w.e. and it should return me something like below (it doesn't have to be structured like that, but each word should come with its type):
[
{"word":"want", "type:"intent"},
{"word":"buy", "type":"intent"},
{"word":"car","type":"car"}
]
The approach I came with was Indexing each word as:
{
"word": "car",
"type": "entity"
}
{
"word": "buy",
"type": "intent"
}
And then I provide the whole phrase, searching by "word". But I had no success so far, because Elastic Search doesn't return any of the words, even although phrases contains words that are indexed.
Any insights/ideas/tips to keep this using one of the main search engines?
If I do need to use a dedicated NER solution, what would be the approach to annotate words like this, without the need to worry about fixing typos, multi-languages, etc? I want to return results only if the person types the intents and entities exactly as they are, so not an advanced NLP solution.
Curiously I didn't find much about this on google.
Upvotes: 0
Views: 158
Reputation: 12672
I created a basic index
and indexed some documents like this
PUT nlpindex/mytype/1
{
"word": "buy",
"type": "intent"
}
I used query string
to search for all the words that appear in a phrase
GET nlpindex/_search
{
"query": {
"query_string": {
"query": "I want to buy a car",
"default_field": "word"
}
}
}
By default the operator
is OR so it will search for every single word in the phrase in word
field.
This is the results I get
"hits": [
{
"_index": "nlpindex",
"_type": "mytype",
"_id": "1",
"_score": 0.09427826,
"_source": {
"word": "car",
"type": "entity"
}
},
{
"_index": "nlpindex",
"_type": "mytype",
"_id": "4",
"_score": 0.09427826,
"_source": {
"word": "want",
"type": "intent"
}
},
{
"_index": "nlpindex",
"_type": "mytype",
"_id": "3",
"_score": 0.09427826,
"_source": {
"word": "buy",
"type": "intent"
}
}
]
Does this help?
Upvotes: 2