Reputation: 23
I have a dataset and among the columns there are column A that have the release year of products and column B that have the sales of each product.
I want to know how many product have sales above the mean for each year.
The dataset is a pandas dataframe.
Thank you and I hope my question is clear
Upvotes: 0
Views: 53
Reputation: 41407
Compute yearly averages with groupby.transform()
and compare them against the individual sales, e.g.:
df = pd.DataFrame({'product': np.random.choice(['foo','bar'], size=10), 'year': np.random.choice([2019,2020,2021], size=10), 'sales': np.random.randint(10000, size=10)})
# product year sales
# 0 foo 2019 7507
# 1 bar 2019 9186
# 2 foo 2021 6234
# 3 foo 2021 7375
# 4 bar 2020 9934
# 5 foo 2021 6403
# 6 foo 2021 7729
# 7 foo 2021 1875
# 8 bar 2020 7148
# 9 foo 2019 8163
df['above_mean'] = df.sales > df.groupby(['product','year']).sales.transform('mean')
df.groupby('year', as_index=False).above_mean.sum()
# year above_mean
# 0 2019 1
# 1 2020 1
# 2 2021 4
Upvotes: 1