Reputation: 43
I'm trying to train and test a Bayesian Classifier in Python.
These lines of code are from an example I found here, but I don't understand what they do.
train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)
There is a similar code block later in the test set:
test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1
Wondering what this does and how I can apply it to a different classification example? What do the numbers in []
mean?
Many thanks
Upvotes: 1
Views: 62
Reputation: 432
The example code, referenced in your post, is training a binary classifier with Naive-Bayes and SVC model.
train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)
This is setting the label for 702 records with all 0 initially. and sets the later half with 1. Binary labels like: spam or ham, true or false, etc. The extract_features builds the {(docid, wordid)->wordcount,..} which is input to these models.
Once you train the model, you need to see how well it performs against a test set. Here you are using 260 records as test set with first half all 0s and the later half all 1s.
test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1
Finally, you run the prediction against the test set and evaluate how close is the accuracy to the test_set of both of these models (NB and SVC).
Upvotes: 1