Reputation: 7072
I have a dataframe with let's say 2 columns: dates and doubles
2017-05-01 2.5
2017-05-02 3.5
... ...
2017-05-17 0.2
2017-05-18 2.5
Now I would like to do a groupby and sum with x rows. So i.e. with 6 rows it would return:
2017-05-06 15.6
2017-05-12 13.4
2017-05-18 18.0
Is there a clean solution to do this without running it through a for-loop with something like this:
temp = pd.DataFrame()
j = 0
for i in range(0,len(df.index),6):
temp[df.ix[i]['date']] = df.ix[i:i+6]['value'].sum()
Upvotes: 1
Views: 2123
Reputation: 5389
This is alternative solution using groupby
range of length of the dataframe.
Two columns using agg
df.groupby(np.arange(len(df))//6).agg(lambda x: {'date': x.date.iloc[0],
'value': x.value.sum()})
Multiple columns you can use first
(or last
) for date and sum
for other columns.
group = df.groupby(np.arange(len(df))//6)
pd.concat((group['date'].first(),
group[[c for c in df.columns if c != 'date']].sum()), axis=1)
Upvotes: 1
Reputation: 38415
I guess you are looking for resample. consider this dataframe
rng = pd.date_range('2017-05-01', periods=18, freq='D')
num = np.random.randint(5,size = 18)
df = pd.DataFrame({'date': rng, 'val': num})
df.resample('6D', on = 'date').sum().reset_index()
will return
date val
0 2017-05-01 14
1 2017-05-07 11
2 2017-05-13 16
Upvotes: 3