NLP-steps or approch to classify text?

Question

I'm working on a project to classify restaurant reviews on sentiment(positive or negative) basis. Also I want to classify that if these comments belongs to food, service, value-for-money, etc category. I am unable to link the steps or the methodology provided on the internet. can anyone provide detailed method or steps to get to the solution.

Vlad · Accepted Answer

How about using bag of words model. It's been tried and tested for ages. It has some downsides compared to more modern methods, but you can still get decent results. And there are tons of material on internet to help you:

Normalize documents to the form ingestable by your pipeline
Convert documents to vectors and perform TF-IDF to filter irrelevant terms. Here is a good tutorial. And convert them to vector form.
Split your documents get some subset of documents and mark the ones that belong to training data according to classes ( Sentiment ) / type of comments. Clearly your documents will belong to two classes.
Apply some type of dimensionality reduction technique to make your models more robust, good discussion is here
Train your models on your training data. You need at least two models one for sentiment and one for type. Some algorithms work with binary classes only so you might need more than to models for comment type ( Food, Value, Service). This might be a good thing because a comment can belong to more than one class ( Food quality and Value, or Value and Service). Scikit-learn has a lot of good models, also I highly recommend orange toolbox it's like a GUI for data science.
Validate your models using validation set. If you accuracy is satisfactory (most classical methods like SVM should give you at leat 90%) go ahead and use it for incoming data

NLP-steps or approch to classify text?

Answers (1)

Related Questions