Reputation: 4996
I am trying to use the new object detection api in tensorflow 1.2, and the example faster-rcnn config, to train on a custom dataset. The error I get is related to some tensor shapes, but it happens seemingly randomly during training, and the exact shape changes too.
INFO:tensorflow:global step 132: loss = 63.3741 (0.262 sec/step)
INFO:tensorflow:global step 133: loss = 33.7362 (0.292 sec/step)
INFO:tensorflow:global step 134: loss = 18.0165 (0.264 sec/step)
INFO:tensorflow:global step 135: loss = 40.5577 (0.266 sec/step)
INFO:tensorflow:global step 136: loss = 24.1086 (0.266 sec/step)
2017-07-10 10:23:49.066345: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066475: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066509: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
[[Node: gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1/_2621 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_13108_gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
As you can see, it runs for a variable number of steps correctly, and then gives me Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
. What I dont understand is why this error is being triggered, and furthermore the where the incompatible shape comes from, as this changes as well between runs.
As I did convert my dataset into the TF format, I was unsure whether that was my issue. However, I have successfully trained for several days on the same dataset with their ssd implementation, so I think it is safe to say the data is formatted correctly.
EDIT: The label map file is here. Again I would like to point out that this same dataset runs perfectly using ssd.
Upvotes: 2
Views: 1747
Reputation: 71
You are reading your sequence examples from tf.train.batch
with allow_smaller_final_batch=True
. The error likely could be the last smaller final batch which is resulting in incompatible shapes with batch sizes
Upvotes: 0
Reputation: 128
You have to configure num_classes = xx
in faster_rcnn_resnet101.config file
Upvotes: 0
Reputation: 77
You can try to start your class id from 1 instead of 0.
item {
id: 1
name: 'balloon'
}
It worked for me.
Upvotes: 0
Reputation: 1558
The Tensorflow Object Detection API assumes that the '0' label is reserved for 'none_of_the_above', so one immediate thing to do is to add 1 to every label index in your label map.
It's unclear why things fail (in a hard way) for Faster R-CNN and not for SSD (probably something for us to dig into) --- but I'd be a bit surprised if you got very good results with SSD using that label map.
Upvotes: 1