mad
mad

Reputation: 2789

What is wrong with my ROC curves when considering deep leaning approaches?

I have an application that is classifying images with CNNs. I am considering Mobilenet, Resnet and Densenet on input sizes of 64 x 64 image blocks. To classify an image, I define its class as being the class most present when classifying its blocks. The problem is highly unbalanced, I have more positive samples than negative ones. I am considering three datasets.

To deal with this issue, I have firstly calculated metrics such as f-measure, normalized accuracy and so on. Here are the normalized accuracy results for some datasets considering the three CNNs:

enter image description here

To build the ROC curve, I decide to define the score of an image the mean score of its blocks, so here is where my problem begins. Please Take a look at some of the ROC curves for these datasets considering the three CNNs below:

enter image description here enter image description here enter image description here

It is quite weird to me to see that approaches that got 50% normalized accuracy also got 0.85, 0.90 and even 0.97 AUC. This last AUC seems to be from an almost perfect classifier, but how can that be possible if its normalized accuracy is 50%?

So, what are the reasons for that? is that because:

1- my problem is unbalanced. So are the positive samples, which are mostly found in my datasets and is the class of interest in my ROC influencing the result?

2- I am using the mean score of blocks as the score of image. Is that any way to overcome this issue?

Here is the code I use to generate labels and scores (PYTHON)

 base_model=MobileNet(input_shape (64,64,3),weights=None,include_top=False)
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(64, activation='relu')(x)
    predictions = Dense(2, activation='softmax')(x)
    model = Model(inputs=base_model.input, outputs=predictions)
    model.load_weights(model_path)

    intermediate_layer_model = Model(inputs=model.input, outputs=model.get_layer("dense_2").output)
    print("Loaded model from disk")
    intermediate_layer_model.compile(loss='categorical_crossentropy', optimizer=algorithm, metrics=['accuracy'])

    #read images, divide them into blocks, predict images and define the mean scores as the score for an image
    with open(test_images_path) as f:
            images_list = f.readlines()
            images_name = [a.strip() for a in images_list]
            predicted_image_vector = []
            groundtruth_image_vector = []

            for line in images_name:
                x_test=[]
                y_test=[]
                print(line)
                image = cv2.imread(line,1)
                #divide into blocks
                windows = view_as_windows(image, (64,64,3), step=64)

                #prepare blocks to be tested later 
                for i in range(windows.shape[0]):
                    for j in range(windows.shape[1]):
                            block=np.squeeze(windows[i,j])
                            x_test.append(block)
                            label = du.define_class(line)
                            y_test.append(label)

            #predict scores for all blocks in the current test image
            intermediate_output = intermediate_layer_model.predict(np.asarray(x_test), batch_size=32, verbose=0)
            #the score for an image is the mean score of its blocks
            prediction_current_image=np.mean(intermediate_output, axis=0)
            predicted_image_vector.append(prediction_current_image)
 groundtruth_image_vector.append(np.argmax(np.bincount(np.asarray(y_test))))

    predicted_image_vector=np.array(predicted_image_vector)
    groundtruth_image_vector=np.array(groundtruth_image_vector)
    print("saving scores and labels to plot ROC curves")

    np.savetxt(dataset_name+ '-scores.txt', predicted_image_vector, delimiter=',') 
    np.savetxt(dataset_name+ '-labels.txt', groundtruth_image_vector, delimiter=',') 

Here is the code I use to generate ROC curve (MATLAB)

function plot_roc(labels_file, scores_file, name_file, dataset_name)

    format longG
    label=dlmread(labels_file);
    scores=dlmread(scores_file);
    [X,Y,T,AUC] = perfcurve(label,scores(:,2),1);   

    f=figure()
    plot(X,Y);
    title(['ROC Curves for Mobilenet in ' dataset_name])
    xlabel('False positive rate'); 
    ylabel('True positive rate');
    txt = {'Area Under the Curve:', AUC};
    text(0.5,0.5,txt)
    saveas(f, name_file);
    disp("ok")



end

Upvotes: 2

Views: 290

Answers (1)

Mark.F
Mark.F

Reputation: 1694

From what I understand about your method - The input image is divided into separate patches that are processed independently a CNN model. Each patch gets its own classification (or score, depending if it is after or prior to the softmax). Than the class of the image is determined based on a vote of the classes of the patches.

But than when you build your ROC curve, you are using the mean scores of the individual patches to determine the classification of the image.

These two different approaches are the reason for the disassociation between the AUC and the normalized accuracy.

For example:

Say you have 3 patches in an image with the following probabilities (for 2 classes):

[cls a, cls b]

[0.51, 0.49]

[0.51, 0.49]

[0.01, 0.99]

By voting class a is the prediction (2 patches vs 1), by mean score class b is the prediction (0.657 vs 0.343).

Personally I don't think that voting is the correct way to classify the image based on patches because it does not take into account the certainty of the model regarding different patches, as was shown in the example. But you are more familiar with your dataset, so perhaps I am wrong.

Regarding how to overcome your problem, I think some more info about the nature of the dataset and the task would help (how unbalanced, what is the final goal, etc..)

Upvotes: 1

Related Questions