Pandas: group by equal range

Question

This is an example of my data frame:

df_lst = [
  {"wordcount": 100, "Stats": 198765, "id": 34},
     {"wordcount": 99, "Stats": 98765, "id": 35},
     {"wordcount": 200, "Stats": 18765, "id": 36},
     {"wordcount": 250, "Stats": 788765, "id": 37},
     {"wordcount": 345, "Stats": 12765, "id": 38},
     {"wordcount": 456, "Stats": 238765, "id": 39},
     {"wordcount": 478, "Stats": 1934, "id": 40},
     {"wordcount": 890, "Stats": 19845, "id": 41},
     {"wordcount": 812, "Stats": 1987, "id": 42}]
df = pd.DataFrame(df_lst)
df.set_index('id', inplace=True)
df.head()

DF:

    Stats   wordcount
id      
34  198765  100
35  98765   99
36  18765   200
37  788765  250
38  12765   345

I'd like to calculate the average Stats for each range of wordcount with a step of 100 so the new data frame looks something like this:

    Average wordcount
    194567  100
    23456   200
    2378    300
    ...

Where 100 means from 0-100 etc. I started writing multiple conditions but have a feeling there is a more efficient way of achieving this. Will appreciate your help.

MaxU - stand with Ukraine · Accepted Answer

use pd.cut() method:

In [92]: bins = np.arange(0, df['wordcount'].max().round(-2) + 100, 100)

In [94]: df.groupby(pd.cut(df['wordcount'], bins=bins, labels=bins[1:]))['Stats'].mean()
Out[94]:
wordcount
100    148765.0
200     18765.0
300    788765.0
400     12765.0
500    120349.5
600         NaN
700         NaN
800         NaN
900     10916.0
Name: Stats, dtype: float64

Pandas: group by equal range

Answers (2)

Related Questions