What would be the right way to extract intent from natural language input in this case?

Question

Before posting this question, I spent a whole day reading stuff from machine learning and nlp under tags section on Stackoverflow.

I have an input statement of the following form

"I am looking for an iPhone 6S possibly rose gold with 16 GB memory, what is the best deal that I can get on this"

Here is what I want from that line

{intent: "discount", brand: "Apple", productLine: "iPhone", model: "6S", color: "rose gold", memory: "16GB"}

My query could be about phones, laptops, anything and may or may not be specific about a particular model. For example, it could have been "What is the best mobile phone to buy"

Here is what I am planning to do but would love some feedback or suggestions if you guys think there is a better way to do it Stage 1 Cleaning text, tokenize, remove stop words Stage 2 Extract category, brand, model,product line from this sentence. I believe I will need a database of some sort that has all this information and I will simply have to do a fuzzy match with the brand name inside the sentence. Not sure how to do this in the most efficient manner.

One approach is to scan the complete database with possibly 1000s of models and then take the sentence and check if the brand word is present or not. I believe this has to be a fuzzy search just in case the person writes i-Ball instead of iBall

Stage 3 Feature extraction such as rose gold and 16 GB memory. Should I use a regex here or are there more sophisticated methods to extract such info.

One approach that I thought of was to extract unigrams, bigrams and trigrams from the input sentence then compare it with the product specification in a fuzzy manner. What about record linkage libraries for this?

Stage 4

How do I free the sentence of all the extra junk such as product name and features and classify it into a discount or a price range or review type query? I am assuming that a classifier works nicely when the sentences are not stuffed with product info inside it otherwise the classifier is going to need a huge training set.

Stage 5 How do I know when to show a specific product and when to show generic stuff. For example a query about iPhone above is quite specific whereas if I am asking about the best mobile phone, its a generic one. Should I use a Naive Bayesian classifier for this or logistic regression.

The Ultimate question What is the best way to go about such an implementation NLTK + Scikitlearn TFLearn TensorFlow

I am assuming that neural networks are only going to accept numbers and output numbers. Does that mean I will have to convert the input to a vector representation.

Thank you for your suggestions in advance.

What would be the right way to extract intent from natural language input in this case?

Answers (1)

Related Questions