Andrea
Andrea

Reputation: 39

How to reduce MSE and improve R2 in Linear Regression model

I try to perform an example of linear regression model in python. The aim is find a linear relationship among two features in my dataset, this features are 'Year' and 'Obesity (%)'. I want train my model to predict the future trend of obesity in the world. The problem is that my MSE is too high and R2 too low. How can improve my model?

This is the link where I found the data set; Obesity-cleaned.csv

CODE


#Analysis of obesity by country

import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from sklearn import metrics
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing

address = 'C:/Users/Andre/Desktop/Python/firstMN/obesity-cleaned.csv'
dt = pd.read_csv(address)

#eliminate superfluos data
dt.drop(dt['Obesity (%)'][dt['Obesity (%)'].values == 'No data'].index, inplace=True)  

for i in range(len(dt)):
   dt['Obesity (%)'].values[i] = float(dt['Obesity (%)'].values[i].split()[0])  

obMean = dt['Obesity (%)'].mean() 
print('%0.3f' %obMean, '\n') 

dt['Obesity (%)'] = dt['Obesity (%)'].astype(float)  #converto il tipo in float 

group = dt.groupby('Country')


print(group[['Year', 'Obesity (%)']].mean(), '\n') 

dt1 = dt[dt['Sex'] == 'Both sexes']   

print(dt1[dt1['Obesity (%)'] == dt1['Obesity (%)'].max()], '\n')   

sb.lmplot('Year', 'Obesity (%)', dt1)
plt.show()

#linear regression predictions

group1 = dt1.groupby('Year')

x = np.array(np.linspace(1975, 2016, 2016-1975+1)).tolist() 
y = np.array([group1['Obesity (%)'].mean()]).tolist()[0]

x1 = np.array([1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002 , 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 ]).reshape((-1, 1))  
y1 = np.array([group1['Obesity (%)'].mean()]).reshape(-1, 1)     

lr = LinearRegression(fit_intercept=False)
lr.fit(x1, y1) 

plt.plot(x, y) 
plt.show() 

print('Coefficients: ', lr.coef_)  
print("Intercept: ", lr.intercept_ )

y_hat = lr.predict(x1)
print('MSE: ', sklearn.metrics.mean_squared_error(y_hat, y1)) 
print('R^2: ', lr.score(x1, y1) ) 
print('var: ', y1.var())

OUTPUT

Coefficients:  [[0.00626604]]
Intercept:  0.0
MSE:  15.09451970012738
R^2:  0.03779706109503678
var:  15.687459567838905 

Correlation among years and obesity (%) is:  (0.9960492544111168, 1.0885274634054143e-43)

Upvotes: 1

Views: 2831

Answers (2)

Barrett Duna
Barrett Duna

Reputation: 36

Remove the fit_intercept=False in your code. If the true model intercept is truly zero, the intercept term will be approximately zero making it unnecessary to set fit_intercept to False. You're essentially constraining the model without, to my knowledge, any reason to do so (correct me if I'm wrong).

From the scikit-learn documentation on the linear regression:

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

I didn't see anywhere where you centered the data. Thus, your results are flawed. To remedy the situation, simply remove fit_intercept=False since it is True by default.

Upvotes: 2

desertnaut
desertnaut

Reputation: 60318

Forcing fit_intercept=False is a huge constraint for the model, and you should be sure that you know exactly what you are doing before deciding to do so.

Fitting without an intercept in simple linear regression practically means that, when our single feature X is 0, the response Y should be also 0; here, it means that in the "year 0" (whatever that may mean), the Obesity should also be 0. Given that, the poor results reported are hardly a surprise (ML is not magic, and it is certainly implied that we do include realistic assumptions in our models).

It's not clear here why you have decided to do so, but I highly doubt it is what you intended to do. You should remove this unnecessary constraint from your model.

Upvotes: 4

Related Questions