Reputation: 659
A dataframe has a time column with int values that start at zero. I want to group my data frame into 100 groups (for example) where the step is ts = df['time'].max()/100
. One naive way to do it, is to test each value of the 'time' column if is greater than t
and
less than t+ts
, where t
is a np.linspace
vector that starts at 0
and ends at df['time'].max()
.
Here is what my data frame looks like:
df.head()
0 1 2 3 time
0 1 1 1 1130165891 59559371
1 2 1 1 1158784502 88177982
2 2 1 1 1158838664 88232144
3 2 1 1 1158838931 88232411
4 2 1 1 1158839132 88232612
Upvotes: 0
Views: 277
Reputation:
You can use pd.cut
to generate the groups:
df.groupby(pd.cut(df['time'], 2)).mean()
Out:
0 1 2 3 time
time
(59530697.759, 73895991.5] 1 1 1 1130165891 59559371
(73895991.5, 88232612] 2 1 1 1158825307 88218787
This has only 2 groups and starts at the minimum because the dataset is very small. You can change the number of groups. Instead of passing the number of groups, you can pass the break points as well (with our without np.linspace).
df.groupby(pd.cut(df['time'], [0, 6*10**7, np.inf], include_lowest=True)).mean()
Out:
0 1 2 3 time
time
[0, 60000000] 1 1 1 1130165891 59559371
(60000000, inf] 2 1 1 1158825307 88218787
I took the mean in both examples to show you how it works. You can use a different method on the groupby object.
Upvotes: 2