Sentiment Analysis classifier using Machine Learning

Question

How can we make a working classifier for sentiment analysis since for that we need to train our classifier on huge data sets.

I have the huge data set to train, but the classifier object (here using Python), gives memory error when using 3000 words. And I need to train for more than 100K words.

What I thought was dividing the huge data set into smaller parts and make a classifier object for each and store it in a pickle file and use all of them. But it seems using all the classifier object for testing is not possible as it takes only one of the object during testing.

The solution which is coming in my mind is either to combine all the saved classifier objects stored in the pickle file (which is just not happening) or to keep appending the same object with new training set (but again, it is being overwritten and not appended).

I don't know why, but I could not find any solution for this problem even when it is the basic of machine learning. Every machine learning project needs to be trained in huge data set and the object size for training those data set will always give a memory error.

So, how to solve this problem? I am open to any solution, but would like to hear what is followed by people who do real time machine learning projects.

Code Snippet :

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

all_words = []
for w in movie_reviews.words():
    all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features = {}
    for w in word_features:
        features[w] = (w in words)
    return features

featuresets = [(find_features(rev), category) for (rev, category) in documents]
numtrain = int(len(documents) * 90 / 100)
training_set = featuresets[:numtrain]
testing_set = featuresets[numtrain:]

classifier = nltk.NaiveBayesClassifier.train(training_set)

PS : I am using the NLTK toolkit using NaiveBayes. My training dataset is being opened and stored in the documents.

Sentiment Analysis classifier using Machine Learning

Answers (1)

Related Questions