Reputation: 969
I am using k-fold cross validation to compute the optimal value of Additive Smoothing parameter alpha. Also, I want to plot the curves of training accuracy and validation accuracy against the values of alpha. I wrote a code for that:
alphas = list(np.arange(0.0001, 1.5000, 0.0001))
#empty lists that stores cv scores and training_scores
cv_scores = []
training_scores = []
#perform k fold cross validation
for alpha in alphas:
naive_bayes = MultinomialNB(alpha=alpha)
scores = cross_val_score(naive_bayes, x_train_counts, y_train, cv=20, scoring='accuracy')
scores_training = naive_bayes.fit(x_train_counts, y_train).score(x_train_counts, y_train)
cv_scores.append(scores.mean())
training_scores.append(scores_training)
#plot cross-validated score, training score vs alpha
plt.plot(alphas, cv_scores, 'r')
plt.plot(alphas, training_scores, 'b')
plt.xlabel('alpha')
plt.ylabel('score')
Is this the correct way to implement this?
Upvotes: 0
Views: 1362
Reputation: 514
Depending on whether you want to tweak other model hyper parameters it may be easier to use what is called a grid search. Using this, you can tweak extra hyper parameters in a simpler way and training scores are available for you. See my below implementation.
parameters = {'alpha':[0.0001, 1.5000, 0.0001]}
classifier = GridSearchCV(MultinomialNB(), parameters, cv=20)
clf.fit(x_train, y_train)
print('Mean train set score: {}'.format(clf.cv_results_['mean_train_score']))
Upvotes: 1