tomasn4a
tomasn4a

Reputation: 615

Linear Regression coefficients 'explode' for a particular train/test split

I am playing around with the "House Sales in King County" dataset, comparing the coefficients of a linear regression, Ridge, and Lasso.

I first do a train/test split, then standardise the data, and then train the three models and compare the coefficients. For most train/test split random seeds, the coefficients of the three models are in the same scale, and I can compare them. But for some random seeds some the coefficients of the linear regression "explode", jumping from values around 10^4-10^5 to something like 10^18.

This only happens for a few coefficients in the linear regression model, those for ridge and lasso are unaffected.

I am unsure about why this happens, any tips or pointers?

Upvotes: 1

Views: 730

Answers (1)

tomasn4a
tomasn4a

Reputation: 615

Silly me, the 'explosion' was due to multicollinearity. I had the following variables in there:

  • sqft_living: Square footage of the living space
  • sqft_above: Square footage of the living space excluding basement
  • sqft_below: Square footage of the basement

Obviously sqft_living = sqft_above + sqft_below. Multicollinearity was causing the coefficients for these 3 variables to be crazy unstable. That's why adding regularisation helped.

Great cautionary tale about the dangers of multicollinearity!

Upvotes: 1

Related Questions