Rezuana Haque
Rezuana Haque

Reputation: 578

Grad Cam outputs for all the images are the same

I am using grad cam to see which regions of the test images are most important for the prediction of resnet50. The output I got has some errors.

Code Snippets:

from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import cv2

class GradCAM:
    def __init__(self, model, classIdx, layerName=None):
        # store the model, the class index used to measure the class
        # activation map, and the layer to be used when visualizing
        # the class activation map
        self.model = model
        self.classIdx = classIdx
        self.layerName = layerName
        # if the layer name is None, attempt to automatically find
        # the target output layer
        if self.layerName is None:
            self.layerName = self.find_target_layer()

    def find_target_layer(self):
        # attempt to find the final convolutional layer in the network
        # by looping over the layers of the network in reverse order
        for layer in reversed(self.model.layers):
            # check to see if the layer has a 4D output
            if len(layer.output_shape) == 4:
                return layer.name
        # otherwise, we could not find a 4D layer so the GradCAM
        # algorithm cannot be applied
        raise ValueError("Could not find 4D layer. Cannot apply GradCAM.")


    def compute_heatmap(self, image, eps=1e-8):
        # construct our gradient model by supplying (1) the inputs
        # to our pre-trained model, (2) the output of the (presumably)
        # final 4D layer in the network, and (3) the output of the
        # softmax activations from the model
        gradModel = Model(
            inputs=[self.model.inputs],
            outputs=[self.model.get_layer(self.layerName).output, self.model.output])

        # record operations for automatic differentiation
        with tf.GradientTape() as tape:
            # cast the image tensor to a float-32 data type, pass the
            # image through the gradient model, and grab the loss
            # associated with the specific class index
            inputs = tf.cast(image, tf.float32)
            (convOutputs, predictions) = gradModel(inputs)
            
            loss = predictions[:, tf.argmax(predictions[0])]
    
        # use automatic differentiation to compute the gradients
        grads = tape.gradient(loss, convOutputs)

        # compute the guided gradients
        castConvOutputs = tf.cast(convOutputs > 0, "float32")
        castGrads = tf.cast(grads > 0, "float32")
        guidedGrads = castConvOutputs * castGrads * grads
        # the convolution and guided gradients have a batch dimension
        # (which we don't need) so let's grab the volume itself and
        # discard the batch
        convOutputs = convOutputs[0]
        guidedGrads = guidedGrads[0]

        # compute the average of the gradient values, and using them
        # as weights, compute the ponderation of the filters with
        # respect to the weights
        weights = tf.reduce_mean(guidedGrads, axis=(0, 1))
        cam = tf.reduce_sum(tf.multiply(weights, convOutputs), axis=-1)

        # grab the spatial dimensions of the input image and resize
        # the output class activation map to match the input image
        # dimensions
        (w, h) = (image.shape[2], image.shape[1])
        heatmap = cv2.resize(cam.numpy(), (w, h))
        # normalize the heatmap such that all values lie in the range
        # [0, 1], scale the resulting values to the range [0, 255],
        # and then convert to an unsigned 8-bit integer
        numer = heatmap - np.min(heatmap)
        denom = (heatmap.max() - heatmap.min()) + eps
        heatmap = numer / denom
        heatmap = (heatmap * 255).astype("uint8")
        # return the resulting heatmap to the calling function
        return heatmap

    def overlay_heatmap(self, heatmap, image, alpha=0.5,
                        colormap=cv2.COLORMAP_VIRIDIS):
        # apply the supplied color map to the heatmap and then
        # overlay the heatmap on the input image
        heatmap = cv2.applyColorMap(heatmap, colormap)
        output = cv2.addWeighted(image, alpha, heatmap, 1 - alpha, 0)
        # return a 2-tuple of the color mapped heatmap and the output,
        # overlaid image
        return (heatmap, output)

Code Snippet for visualising heatmap:

import random

num_images = 5
random_indices = random.sample(range(len(X_test)), num_images)

for idx in random_indices:
    image = X_test[idx] #assuming the image array is the first element in the tuple
    # print(image)
    # image = cv2.resize(image, (224, 224))
    image1 = image.astype('float32') / 255
    image1 = np.expand_dims(image1, axis=0)
    preds = model.predict(image1) 
    i = np.argmax(preds[0])
    icam = GradCAM(model, i, 'conv5_block3_out') 
    heatmap = icam.compute_heatmap(image1)
    heatmap = cv2.resize(heatmap, (224, 224))
    (heatmap, output) = icam.overlay_heatmap(heatmap, image, alpha=0.5)
    fig, ax = plt.subplots(1, 3)
    ax[0].imshow(heatmap)
    ax[1].imshow(image)
    ax[2].imshow(output)

The output:

enter image description here

The problem I am facing is, here in the output you can see the original images are different but the heatmaps, images, and grad cam are the same for all the images. I don't know whats the reason behind this.

Upvotes: 2

Views: 1234

Answers (1)

Msgun
Msgun

Reputation: 172

This question looks a bit older, but here is my answer in case others are facing a similar problem with GradCAM outputs.

I saw a similar issue when using ResNet50. But, by printing the individual GradCAMs, I was able to see that they had different values, although the plotted GradCAM outputs looked similar. As you can also see on the commented Colab link, your GradCAM computation looks fine.

Even though it's a bit hard to spot from your attached picture, I can see that at least the first two GradCAMs are different. So, the saliency maps are usually different for different images, they are just not good enough. I resolved mine by replacing the ResNet50 with a MobileNetV2 model, and it led to much better saliency and classification performance.

As stated in the No Free Lunch theorem, no single model is suited for all problems or datasets, so you would have to experiment with a different model.

Upvotes: 0

Related Questions