Devanshi Sukhija
Devanshi Sukhija

Reputation: 97

Why do we need to preseve the "expected output" during dropout?

I am very confused as to why do we need to preserve the value of the expected output when performing dropout regularisation. Why does it matter if the mean of the outputs of layer l is different in the training and testing phase? Weights that are non-zero after dropout are just a slightly scaled value of its self, how does it affect the decision making power of the neural network?

According to a comment under this question, it says the output layer sigmoid might interpret a value as 0 instead of 1 if not scaled. But weights that are dropped anyways don't contribute.

Please throw some light, I am not able to see the bigger picture of the concept.

Upvotes: 0

Views: 176

Answers (1)

Devanshi Sukhija
Devanshi Sukhija

Reputation: 97

Found the answer to this, courtesy to Andrew Ng's lecture videos. We basically preserve the value of the expected output of activations where dropout is applied so that it doesn't affect the result of the cost and so that it remains the same expected value as without dropout. Hence, we scale the value and spread out weights.

Upvotes: 0

Related Questions