Reputation: 615
I am playing around with the "House Sales in King County" dataset, comparing the coefficients of a linear regression, Ridge, and Lasso.
I first do a train/test split, then standardise the data, and then train the three models and compare the coefficients. For most train/test split random seeds, the coefficients of the three models are in the same scale, and I can compare them. But for some random seeds some the coefficients of the linear regression "explode", jumping from values around 10^4-10^5 to something like 10^18.
This only happens for a few coefficients in the linear regression model, those for ridge and lasso are unaffected.
I am unsure about why this happens, any tips or pointers?
Upvotes: 1
Views: 730
Reputation: 615
Silly me, the 'explosion' was due to multicollinearity. I had the following variables in there:
Obviously sqft_living = sqft_above + sqft_below. Multicollinearity was causing the coefficients for these 3 variables to be crazy unstable. That's why adding regularisation helped.
Great cautionary tale about the dangers of multicollinearity!
Upvotes: 1