aviss
aviss

Reputation: 2439

Pandas: group by equal range

This is an example of my data frame:

df_lst = [
  {"wordcount": 100, "Stats": 198765, "id": 34},
     {"wordcount": 99, "Stats": 98765, "id": 35},
     {"wordcount": 200, "Stats": 18765, "id": 36},
     {"wordcount": 250, "Stats": 788765, "id": 37},
     {"wordcount": 345, "Stats": 12765, "id": 38},
     {"wordcount": 456, "Stats": 238765, "id": 39},
     {"wordcount": 478, "Stats": 1934, "id": 40},
     {"wordcount": 890, "Stats": 19845, "id": 41},
     {"wordcount": 812, "Stats": 1987, "id": 42}]
df = pd.DataFrame(df_lst)
df.set_index('id', inplace=True)
df.head()

DF:

    Stats   wordcount
id      
34  198765  100
35  98765   99
36  18765   200
37  788765  250
38  12765   345

I'd like to calculate the average Stats for each range of wordcount with a step of 100 so the new data frame looks something like this:

    Average wordcount
    194567  100
    23456   200
    2378    300
    ...

Where 100 means from 0-100 etc. I started writing multiple conditions but have a feeling there is a more efficient way of achieving this. Will appreciate your help.

Upvotes: 2

Views: 617

Answers (2)

BENY
BENY

Reputation: 323226

import math
def roundup(x):
    return int(math.ceil(x / 100.0)) * 100
df['roundup']=df.wordcount.apply(roundup)
df.groupby('roundup').Stats.mean()
Out[824]: 
roundup
100    148765.0
200     18765.0
300    788765.0
400     12765.0
500    120349.5
900     10916.0
Name: Stats, dtype: float64

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

use pd.cut() method:

In [92]: bins = np.arange(0, df['wordcount'].max().round(-2) + 100, 100)

In [94]: df.groupby(pd.cut(df['wordcount'], bins=bins, labels=bins[1:]))['Stats'].mean()
Out[94]:
wordcount
100    148765.0
200     18765.0
300    788765.0
400     12765.0
500    120349.5
600         NaN
700         NaN
800         NaN
900     10916.0
Name: Stats, dtype: float64

Upvotes: 5

Related Questions