Anatoly Lushnikov
Anatoly Lushnikov

Reputation: 47

Caffe GoogleNet model predictions are always the same

I am trying to run pretrained googlenet model from caffe model zoo (no finetuning). The model and deploy.prototxt are both downloaded from https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet

Below is the code I'm using:

net = caffe.Net('deploy.prototxt', 'bvlc_googlenet.caffemodel', caffe.TEST)
net.blobs['data'].reshape(1,3,224,224)

image_path = '1.png'
img = caffe.io.load_image(image_path)
img = caffe.io.resize( img, (224, 224, 3) )

# mean subtraction
img[0,:,:] -= 104 / 255.0
img[1,:,:] -= 117 / 255.0
img[2,:,:] -= 123 / 255.0

# 224,224,3 -> 3,224,224
img = np.transpose(img, (2, 0, 1))

out = net.forward(data=np.array([img]))['prob']
print(np.argmax(out))

Looks like model loads fine, nevertheless regardless of input it always outputs the same class (885). What can be the reason?

UPD: Actually the same problem applies to other models regardless of whether I do mean subtraction or not, just the class that is being always detected changes to some different.

Upvotes: 1

Views: 888

Answers (1)

Arka Sadhu
Arka Sadhu

Reputation: 36

I can see a few problems with the code. First is you should use np.transpose before setting the mean, because in caffe.io.load, the image still has the shape (224,224,3). Second is that you need to rescale the images from [0,1] to [0,255]. Also caffe expects the image in a certain order. Small explanation is given here. So you will have to change the default RGB to BGR format.

I would recommend the use of transformer caffe.io.transformer, which packs all these transformations cleanly.

For your example, a code with transformer would be :

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_mean('data', np.array([104,117,123]))
transformer.set_transpose('data',(2,0,1))
transformer.set_channel_swap('data',(2,1,0))
transformer.set_raw_scale('data', 255.0)

image_path = 'cat.jpg'
img = caffe.io.load_image(image_path)
img = caffe.io.resize( img, (224, 224, 3) )

net.blobs['data'].reshape(1,3,224,224)
net.blobs['data'].data[:,:,:] = transformer.preprocess('data',img)
output = net.forward()
out = net.blobs['prob'].data[0].flatten()
labels = np.loadtxt(labels_file, str, delimiter='\t')
print(np.argmax(out))
print ('output label : ' + labels[out.argmax()])

Upvotes: 2

Related Questions