Regression model not successful-python

Question

I have to create a regression model in python

Energy ratings Vs. price and see whether energy ratings depend on price or not.

Here, is the data set and code below,

import statsmodels.formula.api as smf

# Initialise and fit linear regression model using `statsmodels`
model = smf.ols('price ~ energyrating', data=df)

model = model.fit()

The parameter I am getting is one negative, maybe that could be the reason for bad graph but not sure how to improve this.

model.params
#price=2.004943e+06 + (-.913381e+05)*energyrating

Intercept       2.004943e+06
energyrating   -3.913381e+05
dtype: float64

and creating the final model which was unsuccessful,

# Predict values
pred = model.predict()

# Plot regression against actual data
plt.figure(figsize=(12, 6))
plt.plot(df['energyrating'], df['price'], 'o')           # scatter plot showing actual data
plt.plot(df['energyrating'], pred, 'r', linewidth=2)   # regression line
plt.xlabel('Energy ratings')
plt.ylabel('Price')
plt.title('Energy ratings Vs. Price')

plt.show()

How do I improve this? Is the data unstable or any logical error I am missing out on?

Thanks in advance

EDIT:

Frequency graph of energy rating

This is how the energy rating is varying.

Azuuu · Accepted Answer

I guess a simple linear regression cannot capture the relationship between price and energyrating from the plot you gave since price doesn't monotonically decrease or increase when energyrating increases. I suggest you include a quadratic term of energyrating, i.e., adding a new column of energyrating * energyrating, or other higher-order transformations you consider reasonable.

If you are allowed to use other models other than linear regression, I suggest you just average the price over each energyrating(it is discrete from your plot) bin and plot the curve, which I think would be nicer.

For example in pandas:

avg = df.groupby("energyrating")['price'].mean()
avg.plot()

Regression model not successful-python

Answers (1)

Related Questions