Przemek Dabek
Przemek Dabek

Reputation: 533

Problem with linear regression and summarize

I would like to create a plot of my linear regression model showing bike sales for each year summed up at one point, and not like now that there are two points separately.

This is my code:

from sklearn.linear_model import LinearRegression
from sklearn import datasets, linear_model

## Wzrost lub maleje zakup rowerow
## (Purchase of bicycles increases or decreases)
plot1 = df.groupby('Year')['Product_Category'].value_counts().rename('count').reset_index()

x = plot1['Year'].values.reshape(-1, 1)
y = plot1['count'].values.reshape(-1, 1)

# plot #
## linear ##
regr = linear_model.LinearRegression()
regr.fit(x, y)
y_pred = regr.predict(x_test)

#plot#
plt.scatter(x, y,  color='black')
plt.plot(x, y, color='blue', linewidth=3)

This is my plot:

enter image description here

Upvotes: 0

Views: 36

Answers (1)

antoine
antoine

Reputation: 672

As what I can understand from your example, this maybe a solution, replace value_counts by count.

Example data:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'Year': [ 2019, 2019, 2020, 2021], 'Product_Category': ['a', 'b', 'c', 'd']})
print(df)
   Year Product_Category
0  2019                a
1  2019                b
2  2020                c
3  2021                d

The count will return:

plot1 = df.groupby('Year')['Product_Category'].count().rename('count').reset_index()
print(plot1)

  Year  count
0  2019      2
1  2020      1
2  2021      1


plot1 = df.groupby('Year')['Product_Category'].count().rename('count').reset_index()
#x,y#
x = plot1['Year'].values.reshape(-1, 1)
y = plot1['count'].values.reshape(-1, 1)
# plot #

#plot#
plt.scatter(x, y,  color='black')
plt.plot(x, y, color='blue', linewidth=3)

enter image description here

Upvotes: 1

Related Questions