pycaffe cannot transfer the diff from top to bottom

I'm using the backward function of pycaffe to realize the deconvolution process. I first run the forward process and get the output data blob of the net, then assign the data blob to the diff blob of the top layer, then run the backward process. However, the diffs of bottom layers are not changed by the backward process, which are all zeros. I don't know why the diff is not transferred by backward.

def backward(fcn_net, im, name_list_, target_blob):
    start=name_list_[len(name_list_)-1]; end=name_list_[0]
    print(start, end)
    fcn_net.blobs[start].diff[...]=im[...]
    fcn_net.backward(start=start, end=end)
    return fcn_net.blobs[target_blob].diff

In the code above the name_list_ contains all conv layers and pooling layers orderly, and the backward starts from the last conv layer. Thanks!

Upvotes: 1

Answers (2)

Alexey Kruglov

Reputation: 66

Inference of a part of a network with pycaffe using start= and end= arguments is hard, but possible. In case you can go with inferencing all layers, it would be much simpler -- just set "force_backward: true", as suggested by Shai.

The following two problems may cause zero diff in start layer after net.backward(...):

Caffe (its C++ part) optimized out Backward() calls to some layers it considers unnecessary. It optimizes out backpropagation that is not necessary to reach learnable parameters (just no learnable parameters, or lr_mult=0 on all parameters). The most common result is optimizing out backpropagation to the data layer, since it doesn't normally need gradients.

If you need diff on data layer (for example, for Gatys-type style transfer), put "force_backward: true" to the prototxt containing your network layers -- this undoes this optimization.
Another problem is automatic Split layers. When loading a network, Caffe automatically inserts Split layer whenever a top (output) of a layer is used in 2 or more bottoms (inputs) of other layers. You can check your Caffe log file for the layers containing "_split_" substring in their names.

Example:

Original net (nodes are layers):
```
data ---> conv1 -> conv2 -> ...
      \-> somelayer -> ...
```
After loading:
```
data -> split ---> conv1 -> conv2 -> ...
               \-> somelayer -> ...
```
After inserting the auto Split layer, top (output) of layer "data" retains its name, but the bottoms of "conv1" and "somelayer" get renamed to some generated name. As a result, when you run backward(...) for network section starting with layer "conv1", the .diff is only updated in an autogenerated blob. To use the original blob names, shift "start=" arg value to include the automatic split layers (for both forward() and backward() calls).

Upvotes: 2

Shai

Reputation: 114786

It might be the case that your net does not pass gradients by default.
Consider setting force_backward in your net prototxt.
See this answer for example.

Upvotes: 0

pycaffe cannot transfer the diff from top to bottom

Answers (2)

Related Questions