Reputation: 457
I am getting into the programming of networks with caffe and since I am used to more comfortable and "lazy" solutions I am a bit overwhelmed by the problems that can occur.
Right now I am getting the error
Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM
This one is quite well known to be produced by bad cuda or cudnn versions. So i checked those and they are up to date. (Cuda: 8.0.61 Cudnn: 6.0.21)
Since I will only get this error when I add this ReLU layer I suppose it is caused by me confusing a parameter:
layer{
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "relu1"
}
And to give you all the information, here is the error message I get:
I0319 09:41:09.484148 6909 solver.cpp:44] Initializing solver from parameters:
test_iter: 10
test_interval: 1000
base_lr: 0.001
display: 20
max_iter: 800
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.04
stepsize: 200
snapshot: 10000
snapshot_prefix: "models/train"
solver_mode: GPU
net: "train_val.prototxt"
I0319 09:41:09.484392 6909 solver.cpp:87] Creating training net from net file: train_val.prototxt
I0319 09:41:09.485164 6909 net.cpp:294] The NetState phase (0) differed from the phase (1) specified by a rule in layer feed2
I0319 09:41:09.485183 6909 net.cpp:51] Initializing net from parameters:
name: "CaffeNet"
state {
phase: TRAIN
}
layer {
name: "feed"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "train_h5_list.txt"
batch_size: 50
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 1
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "relu1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "relu1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "conv2"
top: "ip2"
param {
lr_mult: 1
decay_mult: 1
}
inner_product_param {
num_output: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "sig1"
type: "Sigmoid"
bottom: "ip2"
top: "sig1"
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "sig1"
bottom: "label"
top: "loss"
}
I0319 09:41:09.485752 6909 layer_factory.hpp:77] Creating layer feed
I0319 09:41:09.485780 6909 net.cpp:84] Creating Layer feed
I0319 09:41:09.485792 6909 net.cpp:380] feed -> data
I0319 09:41:09.485819 6909 net.cpp:380] feed -> label
I0319 09:41:09.485836 6909 hdf5_data_layer.cpp:80] Loading list of HDF5 filenames from: train_h5_list.txt
I0319 09:41:09.485860 6909 hdf5_data_layer.cpp:94] Number of HDF5 files: 1
I0319 09:41:09.486469 6909 hdf5.cpp:32] Datatype class: H5T_FLOAT
I0319 09:41:09.500986 6909 net.cpp:122] Setting up feed
I0319 09:41:09.501011 6909 net.cpp:129] Top shape: 50 227 227 3 (7729350)
I0319 09:41:09.501027 6909 net.cpp:129] Top shape: 50 1 (50)
I0319 09:41:09.501039 6909 net.cpp:137] Memory required for data: 30917600
I0319 09:41:09.501051 6909 layer_factory.hpp:77] Creating layer conv1
I0319 09:41:09.501080 6909 net.cpp:84] Creating Layer conv1
I0319 09:41:09.501087 6909 net.cpp:406] conv1 <- data
I0319 09:41:09.501101 6909 net.cpp:380] conv1 -> conv1
I0319 09:41:09.880740 6909 net.cpp:122] Setting up conv1
I0319 09:41:09.880765 6909 net.cpp:129] Top shape: 50 1 225 1 (11250)
I0319 09:41:09.880781 6909 net.cpp:137] Memory required for data: 30962600
I0319 09:41:09.880808 6909 layer_factory.hpp:77] Creating layer pool1
I0319 09:41:09.880836 6909 net.cpp:84] Creating Layer pool1
I0319 09:41:09.880846 6909 net.cpp:406] pool1 <- conv1
I0319 09:41:09.880861 6909 net.cpp:380] pool1 -> pool1
I0319 09:41:09.880888 6909 net.cpp:122] Setting up pool1
I0319 09:41:09.880899 6909 net.cpp:129] Top shape: 50 1 224 0 (0)
I0319 09:41:09.880913 6909 net.cpp:137] Memory required for data: 30962600
I0319 09:41:09.880921 6909 layer_factory.hpp:77] Creating layer relu1
I0319 09:41:09.880934 6909 net.cpp:84] Creating Layer relu1
I0319 09:41:09.880941 6909 net.cpp:406] relu1 <- pool1
I0319 09:41:09.880952 6909 net.cpp:380] relu1 -> relu1
F0319 09:41:09.881192 6909 cudnn.hpp:80] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM
EDIT: Tried setting the solver mode to CPU, I still get this error.
Upvotes: 1
Views: 2811
Reputation: 26
The reason why it is throwing this error is because you have no more room to "shrink". From your error message: 50 1 224 0 (0) This indicates the size of the net has a 0 in one dimension.
To fix this error, you can tweak some of the parameters, including (S)tride, (K)ernel size, and (P)adding. To calculate the dimensions of your next layer (W_new), you can use the formula:
W_new = (W_old - K + 2*P)/S + 1
So, if we have an input that is 227x227x3 and our first layer has K = 5, S = 2, P = 1, and numOutputs = N, conv1 then has a dimension that is:
(227-5+2*1)/2 + 1 = 112x112xN.
Note: if you end up with an odd number in the numerator, round up after adding 1.
Edit: The reason why it's showing up with the ReLU layer is likely because the ReLU layer has nothing to pass through, ergo it throws an error.
Upvotes: 1
Reputation: 457
I found out one of the problems.
I0319 09:41:09.880765 6909 net.cpp:129] Top shape: 50 1 225 1 (11250)
I0319 09:41:09.880781 6909 net.cpp:137] Memory required for data: 30962600
I0319 09:41:09.880808 6909 layer_factory.hpp:77] Creating layer pool1
I0319 09:41:09.880836 6909 net.cpp:84] Creating Layer pool1
I0319 09:41:09.880846 6909 net.cpp:406] pool1 <- conv1
I0319 09:41:09.880861 6909 net.cpp:380] pool1 -> pool1
I0319 09:41:09.880888 6909 net.cpp:122] Setting up pool1
I0319 09:41:09.880899 6909 net.cpp:129] Top shape: 50 1 224 0 (0)
As you can see the first Convolutional layer will take an input of size (50 227 227 3), wich is a bit problematic, since he thinks that the second dimension contains the channels.
Its only natural that this convolutional layer will simply butcher the dimensions that way and now no further layer after that will get proper input dimensions.
I managed to solve the problem by simply reshaping the input this way:
layer {
name: "reshape"
type: "Reshape"
bottom: "data"
top: "res"
reshape_param {
shape {
dim: 50
dim: 3
dim: 227
dim: 227
}
}
}
the first dimension in this is the batch size, so whoever reads this has to remember to set this dim to 1 in the .prototxt file for the classification phase (since that one won't work with batches)
EDIT: I will mark this as an answer since it covers the basic solution to the problem i had and no other solution is in sight. If anyone wants to shine more light on the matter, please do so.
Upvotes: 2