Python - difference in confusion matrix dimension

Question

I have a question about confusion matrix. I use cross validation to split 148 instances for two array - test and train. Than I call something like that:

def GenerateResult:
   clf = OneVsRestClassifier(GaussianNB())
   clf.fit(x_train, y_train)
   predictions = clf.predict(x_test)
   accuracy = accuracy_score(y_test, predictions)
   confusion_mtrx = confusion_matrix(y_test, predictions)

that is a loop for KFold -> I call function from up:

for train_idx, test_idx in pf.split(x_array):
       x_train, x_test = x_array[train_idx], x_array[test_idx]
       y_train, y_test = y_array[train_idx], y_array[test_idx]
       acc, confusion= GenerateResult(x_train, x_test, y_train, y_test)
       results['First'].append(acc)
       confusion_dict['First'].append(confusion)

Then I sum result and calculate mean

np_gausian = np.asarray(results['gaussian'])
print("[First] Mean: {}".format(np.mean(np_gausian)))

print(confusion_dict['gaussian'])

And I have a problem. In my 148 instances I have 4 classes in output and when I use that loop for KFold I have result with two different confusion matrix. First confusion matrix 3x3:

[[36  1  1]

 [15 17  1]

 [ 0  0  3]]

Second 4x4 :

[[ 0  2  0  0]

 [ 0 41  2  0]

 [ 0 12 16  0]

 [ 0  0  1  0]]

I think that I have a problem with it becouse in my 148 instance I have

Class 1 - 2 ea
Class 2 - 81 ea
Class 3 - 61 ea
Class 4 - 4 ea
All Class - 148

What should I do with it? How can I sum that confusion matrix? What if I change the number of split in KFold? I try to use Pandas but I don't have an idea how to do it. Please help, I use sk-learn for it

Python - difference in confusion matrix dimension

Answers (1)

Related Questions