natpas
natpas

Reputation: 23

Root Mean Squared Error vs Accuracy Linear Regression

I built a simple linear regression model to predict students' final grade using this dataset https://archive.ics.uci.edu/ml/datasets/Student+Performance.

While my accuracy is very good, the errors seem to be big.

enter image description here

I'm not sure if I'm just not understanding the meaning of the errors correctly or if I made some errors in my code. I thought for the accuracy of 92, the errors should be way smaller and closer to 0.

Here's my code:

data = pd.read_csv("/Users/.../student/student-por.csv", sep=";")

X = np.array(data.drop([predict], 1))
y = np.array(data[predict]) 

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1, random_state=42)

linear = linear_model.LinearRegression()

linear.fit(x_train, y_train)

linear_accuracy = round(linear.score(x_test, y_test) , 5)

linear_mean_abs_error = metrics.mean_absolute_error(y_test, linear_prediction)
linear_mean_sq_error = metrics.mean_squared_error(y_test, linear_prediction)
linear_root_mean_sq_error = np.sqrt(metrics.mean_squared_error(y_test, linear_prediction))

Did I make any errors in the code or errors do make sense in this case?

Upvotes: 0

Views: 1767

Answers (1)

griggy
griggy

Reputation: 446

The accuracy metric in sklearn linear regression is the R^2 metric. It essentially tells you the percent of the variation in the dependent variable explained by the model predictors. 0.92 is a very good score, but it does not mean that your errors will be 0. I looked your work and it seems that you used all the numeric variables as your predictors and your target was G3. The code seems fine and the results seem accurate too. In regression tasks it is really hard to get 0 errors. Please let me know if you have any questions. Cheers

Upvotes: 1

Related Questions