Neo-coder
Neo-coder

Reputation: 7840

Python NLP Intent Identification

I am novice in Python and NLP, and my problem is how to finding out Intent of given questions, for example I have sets of questions and answers like this :

question:What is NLP; answer: NLP stands for Natural Language Processing

I did some basic POS tagger on given questions in above question I get entety [NLP] I also did String Matching using this algo.

Basically I faced following issues :

  1. If user ask what is NLP then it will return exact answers
  2. If user ask meaning of NLP then it fail
  3. If user ask Definition of NLP then it fail
  4. If user ask What is Natural Language Processing then it fail

So how I should identify user intent of given questions because in my case String matching or pattern matching not works.

Upvotes: 11

Views: 28480

Answers (4)

Vlad P
Vlad P

Reputation: 166

You can do intent identification with DeepPavlov, it supports multi-label classification. More information can be found in http://docs.deeppavlov.ai/en/master/components/classifiers.html The demo page https://demo.ipavlov.ai

Upvotes: 13

manish Prasad
manish Prasad

Reputation: 676

you can use spacy for training a custom parser for chat intent semantics.

spaCy's parser component can be used to trained to predict any type of tree structure over your input text. You can also predict trees over whole documents or chat logs, with connections between the sentence-roots used to annotate discourse structure.

for example: "show me the best hotel in berlin"

('show', 'ROOT', 'show')
('best', 'QUALITY', 'hotel') --> hotel with QUALITY best
('hotel', 'PLACE', 'show') --> show PLACE hotel
('berlin', 'LOCATION', 'hotel') --> hotel with LOCATION berlin

To train the model you need data in this format:

# training data: texts, heads and dependency labels
# for no relation, we simply chose an arbitrary dependency label, e.g. '-'
TRAIN_DATA = [
    ("find a cafe with great wifi", {
        'heads': [0, 2, 0, 5, 5, 2],  # index of token head
        'deps': ['ROOT', '-', 'PLACE', '-', 'QUALITY', 'ATTRIBUTE']
    }),
    ("find a hotel near the beach", {
        'heads': [0, 2, 0, 5, 5, 2],
        'deps': ['ROOT', '-', 'PLACE', 'QUALITY', '-', 'ATTRIBUTE']
    })]

TEST_DATA:
input : show me the best hotel in berlin
output: [
      ('show', 'ROOT', 'show'),
      ('best', 'QUALITY', 'hotel'),
      ('hotel', 'PLACE', 'show'),
      ('berlin', 'LOCATION', 'hotel')
    ]

for more details Please check the below link. https://spacy.io/usage/examples#intent-parser

Upvotes: 3

saucy wombat
saucy wombat

Reputation: 114

For a general knowledge and list of excellent examples for question and answering based systems, the leaderboard of NLP in the industry are listed here: https://rajpurkar.github.io/SQuAD-explorer/ This process can actually get really complicated depending on the complexity and range of your domain. For example, more advanced approaches apply first order + propositional logic and complex neural nets. One of the more impressive solutions I've seen is bidirectional attention flow: https://github.com/allenai/bi-att-flow, demo is here: http://beta.moxel.ai/models/strin/bi-att-flow/latest

In practice, I have found that if your corpus has more domain-specific terms, you will need to build your own dictionary. In your example, "NLP" and "Natural Language Processing" are the same entity, so you need to include this in a dictionary.

Basically, consider yourself really lucky if you can get away with just a pure statistical approach like cosine distance. You'll likely need to combine with a lexicon-based approach as well. All the NLP projects I have done have had domain-specific terminology and "slang", so I have used combined both statistical and lexicon based methods, especially for feature extraction like topics, intents, and entities.

Upvotes: 2

Daniel
Daniel

Reputation: 6039

I think this really depends on how your frame your problem and your domain. Here is a dataset that might be useful for question type classification and here is an implementation.

These being said, I think you'll need to annotate your text, possibly by Chunker, SRL, etc and extract interesting pattern.

Upvotes: 0

Related Questions