Why do we need to preseve the "expected output" during dropout?

Question

I am very confused as to why do we need to preserve the value of the expected output when performing dropout regularisation. Why does it matter if the mean of the outputs of layer l is different in the training and testing phase? Weights that are non-zero after dropout are just a slightly scaled value of its self, how does it affect the decision making power of the neural network?

According to a comment under this question, it says the output layer sigmoid might interpret a value as 0 instead of 1 if not scaled. But weights that are dropped anyways don't contribute.

Please throw some light, I am not able to see the bigger picture of the concept.

Why do we need to preseve the "expected output" during dropout?

Answers (1)

Related Questions

Why do we need to preseve the &quot;expected output&quot; during dropout?

Answers (1)

Related Questions

Why do we need to preseve the "expected output" during dropout?