Reputation: 7840
I am novice in Python and NLP, and my problem is how to finding out Intent of given questions, for example I have sets of questions and answers like this :
question:What is NLP; answer: NLP stands for Natural Language Processing
I did some basic POS tagger
on given questions in above question I get entety [NLP]
I also did String Matching
using this algo.
Basically I faced following issues :
what is NLP
then it will return exact answersmeaning of NLP
then it failDefinition of NLP
then it failWhat is Natural Language Processing
then it failSo how I should identify user intent of given questions because in my case String matching or pattern matching not works.
Upvotes: 11
Views: 28480
Reputation: 166
You can do intent identification with DeepPavlov, it supports multi-label classification. More information can be found in http://docs.deeppavlov.ai/en/master/components/classifiers.html The demo page https://demo.ipavlov.ai
Upvotes: 13
Reputation: 676
you can use spacy for training a custom parser for chat intent semantics.
spaCy's parser component can be used to trained to predict any type of tree structure over your input text. You can also predict trees over whole documents or chat logs, with connections between the sentence-roots used to annotate discourse structure.
for example: "show me the best hotel in berlin"
('show', 'ROOT', 'show')
('best', 'QUALITY', 'hotel') --> hotel with QUALITY best
('hotel', 'PLACE', 'show') --> show PLACE hotel
('berlin', 'LOCATION', 'hotel') --> hotel with LOCATION berlin
To train the model you need data in this format:
# training data: texts, heads and dependency labels
# for no relation, we simply chose an arbitrary dependency label, e.g. '-'
TRAIN_DATA = [
("find a cafe with great wifi", {
'heads': [0, 2, 0, 5, 5, 2], # index of token head
'deps': ['ROOT', '-', 'PLACE', '-', 'QUALITY', 'ATTRIBUTE']
}),
("find a hotel near the beach", {
'heads': [0, 2, 0, 5, 5, 2],
'deps': ['ROOT', '-', 'PLACE', 'QUALITY', '-', 'ATTRIBUTE']
})]
TEST_DATA:
input : show me the best hotel in berlin
output: [
('show', 'ROOT', 'show'),
('best', 'QUALITY', 'hotel'),
('hotel', 'PLACE', 'show'),
('berlin', 'LOCATION', 'hotel')
]
for more details Please check the below link. https://spacy.io/usage/examples#intent-parser
Upvotes: 3
Reputation: 114
For a general knowledge and list of excellent examples for question and answering based systems, the leaderboard of NLP in the industry are listed here: https://rajpurkar.github.io/SQuAD-explorer/ This process can actually get really complicated depending on the complexity and range of your domain. For example, more advanced approaches apply first order + propositional logic and complex neural nets. One of the more impressive solutions I've seen is bidirectional attention flow: https://github.com/allenai/bi-att-flow, demo is here: http://beta.moxel.ai/models/strin/bi-att-flow/latest
In practice, I have found that if your corpus has more domain-specific terms, you will need to build your own dictionary. In your example, "NLP" and "Natural Language Processing" are the same entity, so you need to include this in a dictionary.
Basically, consider yourself really lucky if you can get away with just a pure statistical approach like cosine distance. You'll likely need to combine with a lexicon-based approach as well. All the NLP projects I have done have had domain-specific terminology and "slang", so I have used combined both statistical and lexicon based methods, especially for feature extraction like topics, intents, and entities.
Upvotes: 2
Reputation: 6039
I think this really depends on how your frame your problem and your domain. Here is a dataset that might be useful for question type classification and here is an implementation.
These being said, I think you'll need to annotate your text, possibly by Chunker, SRL, etc and extract interesting pattern.
Upvotes: 0