Reputation: 75
I am currently working on multi-label image classification using the CNN in keras. In addition to the accuracy of keras, we have also reconfirmed the accuracy of scikit-learn using various evaluation methods (recall, precision, F1 score and accuracy).
We found that the accuracy calculated by keras shows about 90%, while scikit-learn shows only about 60%.
I do not know why this is happening, so please let me know.
Is there something wrong with the keras calculation?
We use sigmoid for the activation function, binary_crossentropy
for the loss function, and adam for the optimizer.
Keras training
input_tensor = Input(shape=(img_width, img_height, 3))
base_model = MobileNetV2(include_top=False, weights='imagenet')
#model.summary()
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dense(2048, activation='relu')(x)
#x = Dropout(0.5)(x)
x = Dense(1024, activation = 'relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(6, activation = 'sigmoid')(x)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs = base_model.input, outputs = predictions)
print("{}層".format(len(model.layers)))
model.compile(optimizer=sgd, loss="binary_crossentropy", metrics=["acc"])
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), batch_size=64, verbose=2)
model_evaluate()
Keras showed 90% (Accuracy).
scikit-learn check
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
thresholds=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
y_pred = model.predict(X_test)
for val in thresholds:
print("For threshold: ", val)
pred=y_pred.copy()
pred[pred>=val]=1
pred[pred<val]=0
accuracy = accuracy_score(y_test, pred)
precision = precision_score(y_test, pred, average='micro')
recall = recall_score(y_test, pred, average='micro')
f1 = f1_score(y_test, pred, average='micro')
print("Micro-average quality numbers")
print("Acc: {:.4f}, Precision: {:.4f}, Recall: {:.4f}, F1-measure: {:.4f}".format(accuracy, precision, recall, f1))
Output(scikit-learn)
For threshold: 0.1
Micro-average quality numbers
Acc: 0.0727, Precision: 0.3776, Recall: 0.8727, F1-measure: 0.5271
For threshold: 0.2
Micro-average quality numbers
Acc: 0.1931, Precision: 0.4550, Recall: 0.8033, F1-measure: 0.5810
For threshold: 0.3
Micro-average quality numbers
Acc: 0.3323, Precision: 0.5227, Recall: 0.7403, F1-measure: 0.6128
For threshold: 0.4
Micro-average quality numbers
Acc: 0.4574, Precision: 0.5842, Recall: 0.6702, F1-measure: 0.6243
For threshold: 0.5
Micro-average quality numbers
Acc: 0.5059, Precision: 0.6359, Recall: 0.5858, F1-measure: 0.6098
For threshold: 0.6
Micro-average quality numbers
Acc: 0.4597, Precision: 0.6993, Recall: 0.4707, F1-measure: 0.5626
For threshold: 0.7
Micro-average quality numbers
Acc: 0.3417, Precision: 0.7520, Recall: 0.3383, F1-measure: 0.4667
For threshold: 0.8
Micro-average quality numbers
Acc: 0.2205, Precision: 0.7863, Recall: 0.2132, F1-measure: 0.3354
For threshold: 0.9
Micro-average quality numbers
Acc: 0.1063, Precision: 0.8987, Recall: 0.1016, F1-measure: 0.1825
Upvotes: 4
Views: 2036
Reputation: 637
There may be two types of correct answers in the case of multi-label classification.
If all of the sub-labels are correct of a prediction. Example: in the demo dataset y_true
, there are 5 outputs. In y_pred
, 3 of them are fully correct.
In this case, the accuracy should be 60%
.
If we also consider the sub-labels of multi-label classification, then the accuracy gets changed. Example: the demo dataset y_true
contains a total of 15 predictions. y_pred
correctly predicts 10 of them. In this case, the accuracy should be 66.7%
.
SkLearn handles multi-label classification as stated in point 1. Whereas, the Keras accuracy metric follows the method stated in point 2. A code example is given below.
Code:
import tensorflow as tf
from sklearn.metrics import accuracy_score
import numpy as np
# A demo dataset
y_true = np.array([[0, 1, 0], [1, 0, 0], [1, 1, 1], [0, 0, 0], [1, 0, 1]])
y_pred = np.array([[1, 0, 0], [1, 0, 0], [0, 0, 0], [0, 0, 0], [1, 0, 1]])
kacc = tf.keras.metrics.Accuracy()
_ = kacc.update_state(y_true, y_pred)
print(f'Keras Accuracy acc: {kacc.result().numpy()*100:.3}')
kbacc = tf.keras.metrics.BinaryAccuracy()
_ = kbacc.update_state(y_true, y_pred)
print(f'Keras BinaryAccuracy acc: {kbacc.result().numpy()*100:.3}')
print(f'SkLearn acc: {accuracy_score(y_true, y_pred)*100:.3}')
Outputs:
Keras Accuracy acc: 66.7
Keras BinaryAccuracy acc: 66.7
SkLearn acc: 60.0
Therefore, you have to choose any of the options. If you choose to go with method 1, then you have to implement an accuracy metric manually. However, multi-label training is generally done using sigmoid
with binary_crossentropy
loss. The binary_crossentropy
minimizes the loss based on method 2. Therefore, you should follow it also.
Upvotes: 5