Cosmozhang
Cosmozhang

Reputation: 259

More training data reduce variance

To my understanding, high variance means the model itself has the problem of over-fitting. But in the Andrew Ng's video lecture, he mentioned that more training data can reduce the high variance. What is the detailed reason?

Upvotes: 1

Views: 1894

Answers (2)

tazaree
tazaree

Reputation: 1

1- more training data size leads to increase SNR (Signal to Noise Ratio) 2- increasing SNR means that noise is decreased. 3- when the noise has decreased the variance of the model will be decreased. please pay attention that variance has appeared from noise(clean data don't cause variance in model)

Upvotes: 0

Noctua
Noctua

Reputation: 5208

Basically, models will overfit if it has too much variance relative to the training set size.

If you have say 5 degrees of freedom, you can perfectly match (fit) 5 samples. But you can't perfectly match a 1000 samples.

So by adding more data samples (and thus hopefully increasing variance in your dataset), you can prevent overfitting.

Unfortunately, it's hard to get more data. It's easier to reduce the degrees of freedom.

Upvotes: 3

Related Questions