Getting different accuracies using different caffe classes(98.65 vs 98.1 vs 98.20)

Question

When I train and then test my model using Caffe's command line interface, I get e.g. 98.65% whereas when I myself write code(given below) to calculate accuracy from the same pre-trained model, I get e.g 98.1% using Caffe.Net.
Everything is straight forward and I have no idea what is causing the issue.
I also tried using Caffe.Classifier and its predict method, and yet get another lesser accuracy(i.e. 98.20%!)
Here is the snippet of code I wrote:

import sys
import caffe
import numpy as np
import lmdb
import argparse
from collections import defaultdict
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import itertools
from sklearn.metrics import roc_curve, auc
import random


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--proto', help='path to the network prototxt file(deploy)', type=str, required=True)
    parser.add_argument('--model', help='path to your caffemodel file', type=str, required=True)
    parser.add_argument('--mean', help='path to the mean file(.binaryproto)', type=str, required=True)
    #group = parser.add_mutually_exclusive_group(required=True)
    parser.add_argument('--db_type', help='lmdb or leveldb', type=str, required=True)
    parser.add_argument('--db_path', help='path to your lmdb/leveldb dataset', type=str, required=True)
    args = parser.parse_args()

    predicted_lables=[]
    true_labels = []
    misclassified =[]
    class_names = ['unsafe','safe']
    count=0
    correct = 0
    batch=[]
    plabe_ls=[]
    batch_size = 50
    cropx = 224
    cropy = 224
    i = 0
    multi_crop = False
    use_caffe_classifier = True

    caffe.set_mode_gpu() 
    # Extract mean from the mean image file
    mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
    f = open(args.mean, 'rb')
    mean_blobproto_new.ParseFromString(f.read())
    mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
    f.close()

    net = caffe.Classifier(args.proto, args.model,
                           mean = mean_image[0].mean(1).mean(1),
                           image_dims = (224, 224))

    net1 = caffe.Net(args.proto, args.model, caffe.TEST) 
    net1.blobs['data'].reshape(batch_size, 3,224, 224)
    data_blob_shape = net1.blobs['data'].data.shape

    #check and see if its lmdb or leveldb
    if(args.db_type.lower() == 'lmdb'):
        lmdb_env = lmdb.open(args.db_path)
        lmdb_txn = lmdb_env.begin()
        lmdb_cursor = lmdb_txn.cursor()
        for key, value in lmdb_cursor:
            count += 1 
            datum = caffe.proto.caffe_pb2.Datum()
            datum.ParseFromString(value)
            label = int(datum.label)
            image = caffe.io.datum_to_array(datum).astype(np.float32)
            #key,image,label
            #buffer n image
            if(count % 5000 == 0):          
                print('{0} samples processed so far'.format(count))

            if(i < batch_size):
                i+=1
                inf= key,image,label
                batch.append(inf)
                #print(key)                 
            if(i >= batch_size):
                #process n image 
                ims=[]              
                for x in range(len(batch)):
                    img = batch[x][1]
                    #img has c,w,h shape! its already gone through transpose and channel swap when it was being saved into lmdb!
                    #Method III : use center crop just like caffe does in test time
                    if (use_caffe_classifier != True):
                        #center crop
                        c,w,h = img.shape
                        startx = h//2 - cropx//2
                        starty = w//2 - cropy//2
                        img = img[:, startx:startx + cropx, starty:starty + cropy]                  
                        #transpose the image so we can subtract from mean 
                        img = img.transpose(2,1,0)
                        img -= mean_image[0].mean(1).mean(1)
                        #transpose back to the original state
                        img = img.transpose(2,1,0)
                        ims.append(img)
                    else:
                        ims.append(img.transpose(2,1,0))    

                if (use_caffe_classifier != True): 
                    net1.blobs['data'].data[...] = ims[:]
                    out_1 = net1.forward()
                    plabe_ls = out_1['pred']                                                                                 
                else:
                    out_1 = net.predict(np.asarray(ims), oversample=multi_crop)
                    plabe_ls = out_1    

                plbl = np.asarray(plabe_ls)
                plbl = plbl.argmax(axis=1)
                for j in range(len(batch)):
                    if (plbl[j] == batch[j][2]):
                        correct+=1
                    else:
                        misclassified.append(batch[j][0])

                    predicted_lables.append(plbl[j])        
                    true_labels.append(batch[j][2]) 
                batch.clear()
                i = 0               


    sys.stdout.write("\rAccuracy: %.2f%%" % (100.*correct/count))
    sys.stdout.flush()
    print(", %i/%i corrects" % (correct, count))

What is causing this difference in accuracies ?

More information :
I am using Python3.5 on windows.
I read images from an lmdb dataset.
The images have 256x256 and center cropped with the size 224x224.
It is finetuned on GoogleNet.
For the Caffe.predict to work well I had to change classify.py
In training, I just use Caffes defaults, such as random crops at training and center crop at test-time.

Changes:
changed line 35 to:

 self.transformer.set_transpose(in_, (2, 1, 0))

and line 99 to :

predictions = predictions.reshape((len(predictions) // 10, 10, -1))

arnold · Accepted Answer

1) First off, you need to revert Line 35 (32?) of classify.py: self.transformer.set_transpose(in_, (2, 1, 0)) back to the original self.transformer.set_transpose(in_, (2, 0, 1)). So it expects HWC and transforms internally to CHW for downstream processing.

2) Run your Classifier branch as it is. You're likely to get a bad result. Please check this. If so, it means the image database is not CWH as you've commented, but actually CHW. After you've confirmed this, make the change to your Classifier branch: ims.append(img.transpose(2,1,0)) to become ims.append(img.transpose(1,2,0)). Re-test your Classifier branch. The result should be 98.2% (goto Step 3) or 98.65% (try Step 4).

3) If your result in Step 3 is 98.2%, also undo your the second change to classify.py. Theoretically, as your images have even height/width so // and / should have no difference. If it does differ or crashes, something is seriously wrong with your image database -- your assumption of the image size is incorrect. You need to check these. They could be off by a pixel or so, and could explain the slight discrepancies in accuracy.

4) If your result in Step 3 is 98.65%, then you need to make changes to the Caffe.Net branch of your code. The database images are CHW, so you need to make the first transpose: img = img.transpose(1,2,0) and the second transpose after mean subtraction to img = img.transpose(2,0,1). Then run your Caffe.Net branch. If you still get 98.1% as before, you should check that mean subtraction is performed correctly by your network.

In Steps (2) and (4), it's possible to get worse results, which means that the problem is likely a difference in mean subtraction for your trained Net vs your expectations in Python code. Check this.

About your 98.2% for the caffe.Classifier:

If you look at lines 78 - 80, the center crop is done along crop_dims , not img_dims. If you further look at line 42 on the caffe.Classifier constructor, the crop_dims are never user-determined. It's determined by the size of the Net's input blobs. Lastly, it you look at line 70, the img_dims are used to resize the images prior to center cropping. So what's happening with your setup is: a) The images are first getting resized to 224 x 224, then uselessly getting center cropped to 224 x 224 ( I assume this is the HxW for your Net ). You obviously will get results poorer than 98.65%. What you need to do is to change the img_dims = (256, 256). That prevents resizing. The crop will be picked up automatically from your Net and you should get your 98.65%.

Getting different accuracies using different caffe classes(98.65 vs 98.1 vs 98.20)

Answers (1)

Related Questions