user2480288
user2480288

Reputation: 639

How to calculate the RMSE on Ridge regression model

I have performed a ridge regression model on a data set (link to the dataset: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data) as below:

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

y = train['SalePrice']
X = train.drop("SalePrice", axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30)
ridge = Ridge(alpha=0.1, normalize=True)
ridge.fit(X_train,y_train)
pred = ridge.predict(X_test)

I calculated the MSE using the metrics library from sklearn as

from sklearn.metrics import mean_squared_error
mean = mean_squared_error(y_test, pred) 
rmse = np.sqrt(mean_squared_error(y_test,pred)

I am getting a very large value of MSE = 554084039.54321 and RMSE = 21821.8, I am trying to understand if my implementation is correct.

Upvotes: 2

Views: 8954

Answers (2)

lesick_pilgrim
lesick_pilgrim

Reputation: 1

It's also possible to change 'squared' parameter.

squared: bool, default=True If True returns MSE value, if False returns RMSE value.

Upvotes: 0

Szymon Maszke
Szymon Maszke

Reputation: 24681

RMSE implementation

Your RMSE implementation is correct which is easily verifiable when you take the sqaure root of sklearn's mean_squared_error.

I think you are missing a closing parentheses though, here to be exact:

rmse = np.sqrt(mean_squared_error(y_test,pred)) # the last one was missing

High error problem

Your MSE is high due to model not being able to model relationships between your variables and target very well. Bear in mind each error is taken to the power of 2, so being 1000 off in price sky-rockets the value to 1000000.

You may want to modify the price with natural logarithm (numpy.log) and transform it to log-scale, it is a common practice especially for this problem (I assume you are doing House Prices: Advanced Regression Techniques), see available kernels for guidance. With this approach, you will not get such big values.

Last but not least, check Mean Absolute Error in order to see your predictions are not as terrible as they seem.

Upvotes: 3

Related Questions