kirti purohit
kirti purohit

Reputation: 431

Regression model not successful-python

I have to create a regression model in python

Energy ratings Vs. price and see whether energy ratings depend on price or not.

enter image description here

Here, is the data set and code below,

import statsmodels.formula.api as smf

# Initialise and fit linear regression model using `statsmodels`
model = smf.ols('price ~ energyrating', data=df)

model = model.fit()

The parameter I am getting is one negative, maybe that could be the reason for bad graph but not sure how to improve this.

model.params
#price=2.004943e+06 + (-.913381e+05)*energyrating

Intercept       2.004943e+06
energyrating   -3.913381e+05
dtype: float64

and creating the final model which was unsuccessful,

# Predict values
pred = model.predict()

# Plot regression against actual data
plt.figure(figsize=(12, 6))
plt.plot(df['energyrating'], df['price'], 'o')           # scatter plot showing actual data
plt.plot(df['energyrating'], pred, 'r', linewidth=2)   # regression line
plt.xlabel('Energy ratings')
plt.ylabel('Price')
plt.title('Energy ratings Vs. Price')

plt.show()

enter image description here

How do I improve this? Is the data unstable or any logical error I am missing out on?

Thanks in advance

EDIT:

Frequency graph of energy rating

enter image description here

This is how the energy rating is varying.

Upvotes: 1

Views: 91

Answers (1)

Azuuu
Azuuu

Reputation: 894

I guess a simple linear regression cannot capture the relationship between price and energyrating from the plot you gave since price doesn't monotonically decrease or increase when energyrating increases. I suggest you include a quadratic term of energyrating, i.e., adding a new column of energyrating * energyrating, or other higher-order transformations you consider reasonable.

If you are allowed to use other models other than linear regression, I suggest you just average the price over each energyrating(it is discrete from your plot) bin and plot the curve, which I think would be nicer.

For example in pandas:

avg = df.groupby("energyrating")['price'].mean()
avg.plot()

Upvotes: 1

Related Questions