how to find the number of rows in a column that are above the mean?

Question

I have a dataset and among the columns there are column A that have the release year of products and column B that have the sales of each product.

I want to know how many product have sales above the mean for each year.

The dataset is a pandas dataframe.

Thank you and I hope my question is clear

tdy · Accepted Answer

Compute yearly averages with groupby.transform() and compare them against the individual sales, e.g.:

df = pd.DataFrame({'product': np.random.choice(['foo','bar'], size=10), 'year': np.random.choice([2019,2020,2021], size=10), 'sales': np.random.randint(10000, size=10)})

#   product  year  sales
# 0     foo  2019   7507
# 1     bar  2019   9186
# 2     foo  2021   6234
# 3     foo  2021   7375
# 4     bar  2020   9934
# 5     foo  2021   6403
# 6     foo  2021   7729
# 7     foo  2021   1875
# 8     bar  2020   7148
# 9     foo  2019   8163

df['above_mean'] = df.sales > df.groupby(['product','year']).sales.transform('mean')
df.groupby('year', as_index=False).above_mean.sum()

#    year  above_mean
# 0  2019           1
# 1  2020           1
# 2  2021           4

how to find the number of rows in a column that are above the mean?

Answers (1)

Related Questions