dizwe
dizwe

Reputation: 81

How can nltk naivebayes classifier learn more featuresets after the train ends?

I'm now making the nltk_classifier classifying sentence's category.

So I already trained classifier using 11000 sentences' featuresets.

train_set, test_set = featuresets[1000:], featuresets[:1000]
classifier = naivebayes.NaiveBayesClassifier.train(train_set)

But I want to add more (sentence,category) featuresets for upgrading classifier. The only way I know is that I append featuresets to list of alreay learned featuresets. That way would make new classifier. But I think that this method is not efficient because It took a lot of time to train one or less more sentence.

Is there any good way to improve classifier's quality by adding featuresets???

Upvotes: 0

Views: 186

Answers (1)

greeness
greeness

Reputation: 16114

Two things.

  1. Naive Bayes is usually super fast. It only visits all your training data for one time and accumulates the feature-class co-occurrence stats. After that, it uses that stats to build the model. Usually it's not a problem to just re-train your model with new (incremental) data.

  2. It's doable to not redo the steps above when new data comes as long as you still have the feature-class stats stored somewhere. Now you just visit the new data the same way as you did in step 1 and keep updating the feature-class co-occurrence stats. At the end of day, you have new numerators (m) and denominators (n), which applies to both class priors P(C) and the probability of feature given a class P(W|C). You could derive the probabilities by m/n.

Friendly reminder of Bayesian formulas in document classification:

-- Given a document D, the probability that the document falls in category of C_j is:

P(C_j|D) = P(D|C_j)*P(C_j)/P(D)

-- That probability is proportional to:

P(C_j|D) ~ P(W1|C_j) P(W2|C_j) ... P(Wk|C_j) * P(C_j) 

based on:

  • naive bayes assumption (all words, e.g., W1, W2, ..., Wk in the doc are independent),
  • throwing away P(D) because every class have the same P(D) as denominator (thus we say proportional not equal to).

-- Now all probabilities on the right side could be computed by a corresponding fraction (m/n), where m and n are stored (or can be derived) in the feature-class co-occurrence matrix.

Upvotes: 1

Related Questions