Reputation: 3803
This is definitely a newbie question. Video classification task in caffe .
I have a neural network I have to train with videos(group of images). I can choose to change the shape of the input to the network from several options.
In all cases I assume that the network architecture (arrangement and number of layers) & learning parameters (LR/decay/Regularization/etc) to be constant.
For example I could choose to give my input to the network as one of the following.
1) batch_size x (no_of_imgs*no_of_channels) x height x width {3 dimensional input}
2) batch_size x no_of_imgs x no_of_channels x height x width {4 dimensional input}
3) batch_size x no_of_channels x no_of_imgs x height x width {4 dimensional input}
How would the input shape influence the accuracy of the network?
Upvotes: 2
Views: 176
Reputation: 40506
I would definitely advice you to choose second set up. In this case you can make use of a different spatial and spectral properties and invariances of images which might help you in better learning when using convolutional architectures. In first set up - much of both spatial and spectral information is lost. In third - a little bit less but still some spectral information might be lost which may harm your learning process.
Upvotes: 1