
Reputation: 1487

Is it possible to use arbitrary image sizes in caffe?

I know that caffe has the so called spatial pyramid layer, which enables networks to use arbitrary image sizes. The problem I have is, that the network seems to refuse, to use arbitrary image sizes within a single batch. Do I miss something or is this the real problem?.

My train_val.prototxt:

name: "digits"
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  transform_param {
    scale: 0.00390625
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/train_lmdb"
    batch_size: 64
    backend: LMDB
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  transform_param {
    scale: 0.00390625
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/test_lmdb"
    batch_size: 10
    backend: LMDB
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  include {
    phase: TRAIN
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  include {
    phase: TEST
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "bn1"
  top: "conv2"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
  name: "spatial_pyramid_pooling"
  type: "SPP"
  bottom: "conv2"
  top: "pool2"
  spp_param {
    pyramid_height: 2
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: false
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  include {
    phase: TRAIN
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: true
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  param {
    lr_mult: 0
  include {
    phase: TEST
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "bn2"
  top: "ip1"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"

Link to another question regarding a subsequent problem.

Upvotes: 2

Views: 303

Answers (1)


Reputation: 114966

You are mixing several concepts here.

Can a net accept arbitrary input shapes?
Well, not all nets can work with any input shape. In many cases a net is restricted to the input shape for which it was trained.
In most cases, when using fully-connected layers ("InnerProduct"), these layers expects an exact input dimension, thus changing the input shape "breaks" these layers and restrict the net to a specific, pre-defined input shape.
On the other hand "fully convolutional nets" are more flexible with regard to input shape and can usually process any input shape.

Can one change input shape during batch training?
Even if your net architecture allows for arbitrary input shape, you cannot use whatever shape you want during batch training because the input shape of all samples in a single batch must be the same: How can you concatenate a 27x27 image with another of shape 17x17?

It seems like the error you are getting is from the "Data" layer that is struggling with concatenating samples of different shapes into a single batch.

You can resolve this issue by setting batch_size: 1 processing one sample at a time and set iter_size: 32 in your solver.prototxt to average the gradients over 32 samples getting the SGD effect of batch_size: 32.

Upvotes: 4

Related Questions