esg1
esg1

Reputation: 43

Training Bayesian Classifier

I'm trying to train and test a Bayesian Classifier in Python.

These lines of code are from an example I found here, but I don't understand what they do.

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)

There is a similar code block later in the test set:

test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1

Wondering what this does and how I can apply it to a different classification example? What do the numbers in [] mean? Many thanks

Upvotes: 1

Views: 62

Answers (1)

Pari Rajaram
Pari Rajaram

Reputation: 432

The example code, referenced in your post, is training a binary classifier with Naive-Bayes and SVC model.

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)

This is setting the label for 702 records with all 0 initially. and sets the later half with 1. Binary labels like: spam or ham, true or false, etc. The extract_features builds the {(docid, wordid)->wordcount,..} which is input to these models.

Once you train the model, you need to see how well it performs against a test set. Here you are using 260 records as test set with first half all 0s and the later half all 1s.

test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1

Finally, you run the prediction against the test set and evaluate how close is the accuracy to the test_set of both of these models (NB and SVC).

Upvotes: 1

Related Questions