Reputation: 91
I am using the Caffe framework for windows (downloaded from here) on a Windows 7 64-bit machine. I am using C++ within Visual Studio Community 2013. I use the pre-trained GoogLeNet model to extract the loss1-fc layer output to use as a feature vector for each image. So far so good.
Recently i tried changing my software for use with video frames. So i changed the first layer from an ImageData layer to a Memory layer, so i can send to Caffe a vector of OpenCV mats instead of the naive approach of writing each frame to disk and sending a file list to caffe.
Now, i noticed i won't get the same results for the same images! When using the ImageData layer, there's no such thing.
I use CPU (no Cudnn, no GPU).
The function i use for feature extraction is the following:
void feature_extraction_pipeline_memory(boost::shared_ptr<Net<Dtype>> feature_extraction_net, vector<cv::Mat> imgs, vector<int> labels, float** blobFeats, vector<string> blob_names){
boost::dynamic_pointer_cast<caffe::MemoryDataLayer<float>>(feature_extraction_net->layers()[0])->AddMatVector(imgs, labels);
size_t num_mini_batches = imgs.size();
size_t num_features = blob_names.size();
int dim_features;
int batch_size;
vector<Blob<float>*> input_vec;
vector<int> image_indices(num_features, 0);
for (size_t batch_index = 0; batch_index < num_mini_batches; ++batch_index) {
feature_extraction_net->Forward(input_vec);
for (size_t i = 0; i < num_features; ++i) {
const boost::shared_ptr<Blob<Dtype>> feature_blob = feature_extraction_net->blob_by_name(blob_names[i]);
batch_size = feature_blob->num();
dim_features = feature_blob->count() / batch_size;
const Dtype* feature_blob_data;
for (size_t n = 0; n < batch_size; ++n) {
feature_blob_data = feature_blob->cpu_data() + feature_blob->offset(n);
for (size_t d = 0; d < dim_features; ++d)
blobFeats[i][(image_indices[i] * dim_features) + d] = feature_blob_data[d];
++image_indices[i];
} // n < batch_size
} // i < num_features
} // batch_index < num_mini_batches
}
The imgs
vector is a vector of mat. The labels
is a vector of int, all set to 0. I wrote all images to disk again once they were added to vector. I checked and there is no problem with that. So there is nothing wrong when loading the images. By the way i use OpenCV 3.1.
The memory layer in GoogLeNet prototxt file is declared as follows:
layer {
name: "data"
type: "MemoryData"
top: "data"
top: "label"
memory_data_param {
batch_size: 1
channels: 3
height: 227
width: 227
}
transform_param {
crop_size: 227
mirror: true
mean_file: "model_googlenet_mem/imagenet_mean.binaryproto"
}
include: { phase: TEST }
}
and is the first layer.
I print the first 10 values for each image. Note that images 0, 1, 2, 3 are the EXACT same file copied and the same holds for 6, 7 and 8 images.
1st run:
0.jpg :: 3.149, 0.000, 0.000, 0.000, 1.586, 0.000, 0.000, 0.755, 0.000, 4.749,
1.jpg :: 2.680, 0.000, 0.000, 0.560, 0.970, 0.000, 0.000, 1.083, 0.000, 4.420,
2.jpg :: 2.680, 0.000, 0.000, 0.560, 0.970, 0.000, 0.000, 1.083, 0.000, 4.420,
3.jpg :: 2.680, 0.000, 0.000, 0.560, 0.970, 0.000, 0.000, 1.083, 0.000, 4.420,
4.jpg :: 3.957, 0.000, 0.000, 0.000, 0.868, 0.000, 0.000, 0.000, 0.000, 6.396,
5.jpg :: 3.179, 0.000, 0.000, 0.000, 0.906, 0.000, 0.000, 0.000, 0.000, 5.508,
6.jpg :: 4.951, 0.000, 0.000, 0.000, 0.000, 0.343, 2.993, 0.000, 0.000, 0.000,
7.jpg :: 4.567, 0.000, 0.000, 0.000, 0.000, 1.251, 2.446, 0.000, 0.000, 0.000,
8.jpg :: 4.951, 0.000, 0.000, 0.000, 0.000, 0.343, 2.993, 0.000, 0.000, 0.000,
9.jpg :: 5.678, 0.000, 0.000, 2.010, 0.000, 1.064, 2.412, 0.000, 0.000, 0.000,
2nd run:
0.jpg :: 2.680, 0.000, 0.000, 0.560, 0.970, 0.000, 0.000, 1.083, 0.000, 4.420,
1.jpg :: 2.680, 0.000, 0.000, 0.560, 0.970, 0.000, 0.000, 1.083, 0.000, 4.420,
2.jpg :: 3.149, 0.000, 0.000, 0.000, 1.586, 0.000, 0.000, 0.755, 0.000, 4.749,
3.jpg :: 2.680, 0.000, 0.000, 0.560, 0.970, 0.000, 0.000, 1.083, 0.000, 4.420,
4.jpg :: 3.957, 0.000, 0.000, 0.000, 0.868, 0.000, 0.000, 0.000, 0.000, 6.396,
5.jpg :: 2.928, 0.000, 0.000, 0.000, 0.769, 0.000, 0.000, 0.000, 0.000, 5.552,
6.jpg :: 4.567, 0.000, 0.000, 0.000, 0.000, 1.251, 2.446, 0.000, 0.000, 0.000,
7.jpg :: 4.567, 0.000, 0.000, 0.000, 0.000, 1.251, 2.446, 0.000, 0.000, 0.000,
8.jpg :: 4.951, 0.000, 0.000, 0.000, 0.000, 0.343, 2.993, 0.000, 0.000, 0.000,
9.jpg :: 5.678, 0.000, 0.000, 2.010, 0.000, 1.064, 2.412, 0.000, 0.000, 0.000,
The layers output is different for the same images and different for different runs! When using the same procedure with the ImageData layer there is no such problem. Also, the problem holds for the output of other layers too, for example loss3/classifier. So, i suspect there might be a bug within the MemoryLayer implementation.
Has anyone noticed this strange behaviour? I read that cudnn might produce non-deterministic results but i ran my model on CPU. Any thoughts on this are welcome.
Upvotes: 3
Views: 651
Reputation: 91
I found out what went wrong and i'll post here the answer to help others.
It turns out that GoogLeNet requires input images to be 224x224x3 size, and you must NOT subtract the mean in TEST phase. So by changing the definition of the memory layer in .prototxt file to this:
name: "GoogleNet"
layer {
name: "data"
type: "MemoryData"
top: "data"
top: "label"
memory_data_param {
batch_size: 1
channels: 3
height: 224
width: 224
}
}
...
i got the results i expected. Many thanks to @Miki for pointing out the OpenCV tutorial on their dnn module, which helped me clarify this.
Upvotes: 2