Bigdata User
Bigdata User

Reputation: 11

Save Naive Bayes Classifier in memory

I am new in NLTK and machine learning. I'm using Python with NLTK Naive Bayes Classifier . I have create a Naive Bayes Classifier for text classification using NLTK and save it on disk. I am also able to load it when needed to classify some test data by using this python code:

import pickle
f = open('classifier.pickle')
classifier = pickle.load(f)
f.close()

But my problem is that whenever an new test data come , I have to load this classifier again and again in memory that takes lots of time (2-3 min) to load as it have large size. Also if I have to run two instances of the same sentimental analysis program, that will take double RAM as both program will load this classifier separately. My questions is: Is there any technique to store this classifier in memory so that whenever needed the sentimental anylysis programs can read this directly from memory or is there any other method through which the load time of the classifier can be minimize. Thanks in advance for your help.

Upvotes: 1

Views: 812

Answers (1)

Tommy
Tommy

Reputation: 13652

You can't have it both ways. You can either keep pickling/unpickling one at a time to use less RAM, or you can store both in memory, using twice as much ram, but reducing load times and disk i/o wait times.

Are the two classifiers trained using different training data, or are you using the same classifier in parallel? It sounds like the latter from your usage of "two instances", and in that case you may want to look into threading to allow the same classifier to work with two sets of data (some parallelism may be achieved by classify some of the data, then doing some other stuff like results processing to allow the other thread to classify, repeat).

My expertise in this comes from having started an open source NLTK based sentiment analysis system: https://bitbucket.org/tommyjcarpenter/evopminer.

Upvotes: 1

Related Questions