Reputation: 147
I applied a linear regression on some features to predict the target with 10 folds cross validation.
MinMax scale was applied for both the features and the target.
Then the features standardized.
When I run the model, the r2 equal to 0.65 and MSE is 0.02.
But when I use the target as they are without MinMax scaling, I got r2 same but the MSE increase a lot to 18.
My question is, do we have to deal with targets as same we do with features in terms of data preprocessing? and which of the values above is correct? because the mse got quit bigger with out scaling the target.
Some people say we have to scale the targets too while others say no.
Thanks in advance.
Upvotes: 0
Views: 1716
Reputation: 598
Whether you scale your target or not will change the 'meaning' of your error. For example, consider 2 different targets, one ranged [0, 100] and another one [0, 10000]. If you run models against them (with no scaling), MSE of 20 would mean different things for the two models. In the former case it will be disastrous, while in the latter case it will be pretty decent.
So the fact that you get lower MSE with target range [0, 1] than the original is not surprising.
At the same time, r2 value is independent of the range since it is calculated using variances.
Scaling allows you to compare model performance for different targets, among other things.
Also for some model types (like NNs) scaling would be more important.
Hope it helps!
Upvotes: 2