edn
edn

Reputation: 2193

Neural Network - Input Normalization

It is a common practice to normalize input values (to a neural network) to speed up the learning process, especially if features have very large scales.

In its theory, normalization is easy to understand. But I wonder how this is done if the training data set is very large, say for 1 million training examples..? If # features per training example is large as well (say, 100 features per training example), 2 problems pop up all of a sudden: - It will take some time to normalize all training samples - Normalized training examples need to be saved somewhere, so that we need to double the necessary disk space (especially if we do not want to overwrite the original data).

How is input normalization solved in practice, especially if the data set is very large?

One option maybe is to normalize inputs dynamically in the memory per mini batch while training.. But normalization results will then be changing from one mini batch to another. Would it be tolerable then?

There is maybe someone in this platform having hands on experience on this question. I would really appreciate if you could share your experiences.

Thank you in advance.

Upvotes: 0

Views: 1166

Answers (1)

Aiden Grossman
Aiden Grossman

Reputation: 347

A large number of features makes it easier to parallelize the normalization of the dataset. This is not really an issue. Normalization on large datasets would be easily GPU accelerated, and it would be quite fast. Even for large datasets like you are describing. One of my frameworks that I have written can normalize the entire MNIST dataset in under 10 seconds on a 4-core 4-thread CPU. A GPU could easily do it in under 2 seconds. Computation is not the problem. While for smaller datasets, you can hold the entire normalized dataset in memory, for larger datasets, like you mentioned, you will need to swap out to disk if you normalize the entire dataset. However, if you are doing reasonably large batch sizes, about 128 or higher, your minimums and maximums will not fluctuate that much, depending upon the dataset. This allows you to normalize the mini-batch right before you train the network on it, but again this depends upon the network. I would recommend experimenting based on your datasets, and choosing the best method.

Upvotes: 1

Related Questions