why pretraining for convolutional neural networks

Usually Back propagation NN has the problem of vanishing gradients. I found that Convolutional NN (CNN) some how get rid of this vanishing gradient problems (why?).

Also in some papers some pretraining approaches have been discussed for CNN. Could somebody explain me the following?

    (1) the resons for pretraining in CNN and 
    (2) what are the problems/limitations with CNN?
    (3) any relavent papers talking about the limitation of CNN?

Thanks in advance.

Upvotes: 0

Answers (1)

Cylonmath

Reputation: 371

Pretraining is a regularization technique. It improves generalization accuracy of your model. Since the network is exposed to large amount of data (we have vast amount of unsupervised data in many taks), weight parameters are carried to a space that is more likely to represent the data distribution in overall rather than overfitting a specific subset of underlying data distribution. Neural nets, especially those with high model representation capacity with tons of hidden units, are tend to overfit your data, and vulnerable to random parameter initializations. Also, as initial layers are properly initialized in supervised way, gradient dilution problem is not that severe anymore. This is why pretraining is used as an initial step to supervised task which is generally carried with gradient descent algorithm.
CNNs share the same fate with other Neural Nets. There are too many parameters to tune; optimal input patch size, number of hidden layers, number of feature maps per layer, pooling and stride sizes, normalization windows, learning rate and others. Thus, the problem of model selection is relatively harder compared to other ML techniques. Training of large networks are either carried on GPUs or cluster of CPUs.

Upvotes: 3

why pretraining for convolutional neural networks

Answers (1)

Related Questions