PirateApp
PirateApp

Reputation: 6202

What would be the right way to extract intent from natural language input in this case?

Before posting this question, I spent a whole day reading stuff from machine learning and nlp under tags section on Stackoverflow.

I have an input statement of the following form

"I am looking for an iPhone 6S possibly rose gold with 16 GB memory, what is the best deal that I can get on this"

Here is what I want from that line

{intent: "discount", brand: "Apple", productLine: "iPhone", model: "6S", color: "rose gold", memory: "16GB"}

My query could be about phones, laptops, anything and may or may not be specific about a particular model. For example, it could have been "What is the best mobile phone to buy"

Here is what I am planning to do but would love some feedback or suggestions if you guys think there is a better way to do it Stage 1 Cleaning text, tokenize, remove stop words Stage 2 Extract category, brand, model,product line from this sentence. I believe I will need a database of some sort that has all this information and I will simply have to do a fuzzy match with the brand name inside the sentence. Not sure how to do this in the most efficient manner.

One approach is to scan the complete database with possibly 1000s of models and then take the sentence and check if the brand word is present or not. I believe this has to be a fuzzy search just in case the person writes i-Ball instead of iBall

Stage 3 Feature extraction such as rose gold and 16 GB memory. Should I use a regex here or are there more sophisticated methods to extract such info.

One approach that I thought of was to extract unigrams, bigrams and trigrams from the input sentence then compare it with the product specification in a fuzzy manner. What about record linkage libraries for this?

Stage 4

How do I free the sentence of all the extra junk such as product name and features and classify it into a discount or a price range or review type query? I am assuming that a classifier works nicely when the sentences are not stuffed with product info inside it otherwise the classifier is going to need a huge training set.

Stage 5 How do I know when to show a specific product and when to show generic stuff. For example a query about iPhone above is quite specific whereas if I am asking about the best mobile phone, its a generic one. Should I use a Naive Bayesian classifier for this or logistic regression.

The Ultimate question What is the best way to go about such an implementation NLTK + Scikitlearn TFLearn TensorFlow

I am assuming that neural networks are only going to accept numbers and output numbers. Does that mean I will have to convert the input to a vector representation.

Thank you for your suggestions in advance.

Upvotes: 4

Views: 2497

Answers (1)

Aaron
Aaron

Reputation: 2364

My advice would be not to worry about Tensorflow if you are just starting off. You can use sklearn with a built-in classifier like naive bayes. There are some tutorials that will show you how to get from text to vectors of numbers and feed that into a classifier to get a predicted label.

If the classification problem that you are dealing with has a lot to do with topic or intent then unigram statistics are surprisingly effective. You can start by just using unigrams and if that isn't getting you where you need to be then try joining multi-word expressions to make inputs like "iPhone_6S possibly rose_gold with 16_GB memory"

Upvotes: 2

Related Questions