Reputation: 595
I am trying to display the precision, recall and F-mesure but there are extremely low, do you know why ?
total_verbatim = X.shape[0]
print(total_verbatim)
labels = np.zeros(total_verbatim);#creation de variable ; consulter les mal étiquettés +bien étiquettés
#error avec configuration avec l'ensemble
labels[1:1315]=0; #motivations
labels[1316:1891]=1;#freins
cv_splitter = KFold(n_splits=10, shuffle=False, random_state=None)
model1 = LinearSVC()
model2 = MultinomialNB()
models = [model1, model2]
for model in models:
#verbatim_preprocess = np.array(verbatim_train_remove_stop_words_lemmatize)
y_pred = cross_val_predict(model, X, labels, cv=cv_splitter)
print("Model: {}".format(model))
print("matrice confusion: {}".format(confusion_matrix(labels, y_pred)))
print("Accuracy: {}".format(accuracy_score(labels, y_pred)))
print("Precision: {}".format(precision_score(labels, y_pred)))
print("Recall: {}".format(recall_score(labels, y_pred)))
print("F mesure: {}".format(f1_score(labels, y_pred)))
there is the result , when computing manually the result are much higher for the precision and recall :
Model: LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
matrice confusion: [[963 353]
[518 57]]
Accuracy: 0.5393971443680592
Precision: 0.13902439024390245
Recall: 0.09913043478260869
F mesure: 0.11573604060913706
Model: MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
matrice confusion: [[1248 68]
[ 574 1]]
Accuracy: 0.6604970914859862
Precision: 0.014492753623188406
Recall: 0.0017391304347826088
F mesure: 0.0031055900621118015
Upvotes: 0
Views: 46
Reputation: 33127
Oui. The classification model is not good.
By looking only at the confusion matrix of the first model:
matrice confusion: [[963 353]
[518 57]]
You can see that 353 from class 1 and 518 samples from class 2 are misclassified respectively.
Ideally, you should have counts only on the diagonal.
Also, the Accuracy it's almost 0.5 i.e., you predict at chance level.
Similarily, for model 2.
To improve the models try to play with different hyperparameters of the models and different Folds numbers.
Do a GridSearch
see here
Upvotes: 1