Classification report results

Question

I think there is some issue in my parameters as I am getting different results. Since the amount of code is huge, I will not be able to copy and paste all of it, but only the relevant parts. I am using different models to predict if an account is fake or if it is not. An example of model is the following:

rf = Pipeline([
        ('rfCV',FeaturesSelection.countVect),
        ('rf_clf',RandomForestClassifier(n_estimators=200,n_jobs=3))
        ])
    
rf.fit(DataPreparation.train_acc['Acc'],DataPreparation.train_acc['Label'])
predicted_rf = rf.predict(DataPreparation.test_acc['Acc'])
np.mean(predicted_rf == DataPreparation.test_acc['Label'])

Then I use K-Fold cross validation: 
def confusion_matrix(classifier):
    
    k_fold = KFold(n_splits=5)
    scores = []
    confusion = np.array([[0,0],[0,0]])

    for train_ind, test_ind in k_fold.split(DataPreparation.train_acc):
        train_text = DataPreparation.train_acc.iloc[train_ind]['Acc'] 
        train_y = DataPreparation.train_acc.iloc[train_ind]['Label']
    
        test_text = DataPreparation.train_acc.iloc[test_ind]['Acc']
        test_y = DataPreparation.train_acc.iloc[test_ind]['Label']
        
        classifier.fit(train_text,train_y)
        predictions = classifier.predict(test_text)
        
        confusion += confusion_matrix(test_y,predictions)
        score = f1_score(test_y,predictions)
        scores.append(score)

        return (print('Score:', sum(scores)/len(scores)))

Applying it to all the classifiers

build_confusion_matrix(nb_pipeline)
build_confusion_matrix(svm_pipeline)
build_confusion_matrix(rf)

I get:

Score: 0.5697
Score: 0.5325
Score: 0.5857

However, if I want to create classification reports as follows:

print(classification_report(DataPreparation.test_acc['Label'], predicted_nb))
print(classification_report(DataPreparation.test_acc['Label'], predicted_svm))
print(classification_report(DataPreparation.test_acc['Label'], predicted_rf))

The output is different. For example: (NB)

               precision    recall  f1-score   support

     0.0       0.97      0.86      0.91       580
     1.0       0.41      0.72      0.53        80

(SVM)

               precision    recall  f1-score   support

     0.0       0.94      0.96      0.95       580
     1.0       0.61      0.53      0.52        80

If I create a summary report as follows:

f1 = f1_score(DataPreparation.test_acc['Label'], predicted_rf)
pres = precision_score(DataPreparation.test_acc['Label'], predicted_rf)
rec = recall_score(DataPreparation.test_acc['Label'], predicted_rf)
acc = accuracy_score(DataPreparation.test_acc['Label'], predicted_rf)
    
res = res.append({'Precision': pres, 
                     'Recall': rec, 'F1-score': f1, 'Accuracy': acc}, ignore_index = True)

I get also different results.

I am looking at the f1-score. I should expect the same from all the classification reports.

Could you please tell me if you spot any error in the parameters I am using for building the classification reports, score, and/or summary table?

Gaussian Prior · Accepted Answer

F1 score is inherently connected to class. Thats why there are 2 F1 score in your classification reports. When you print f1_score(true, predicted) it gives you only one number, which according to sklearn's documentation defaults to the f1 score from the class which was assigned as positive (source: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html, parameters > average). The classification report returns all kinds of average, however the one you included is micro - f1 score which differs from the previous f1 score and is calculated based on the total True Positives, False Negatives and False Positives (if you check https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html, in the provided example micro f1 for class 2 is 80% because 2 '2's were classified correctly as 2's and 2 other instances were classified correctly without being '2' and one '2' was not classified as '2'). Now if the very first score you provided differs from the last score despite the fact that they were both summoned by the same sklearn function, that is because the first number derived from a CV scheme on your data.

Classification report results

Answers (1)

Related Questions