Reputation: 1916
I have the following situation
date_range = pd.date_range('20180101', '20180105')
date_list = list(itertools.chain.from_iterable(itertools.repeat(date, 2) for date in date_range))
num_list = np.random.randint(1,100,size=(10))
date2 = ['2018-12-31']*10
df = pd.DataFrame({'date1':date_list,'numbers':num_list,'date2':date2})
displaying this dataframe gives
date1 date2 numbers
0 2018-01-01 2018-12-31 38
1 2018-01-01 2018-12-31 2
2 2018-01-02 2018-12-31 8
3 2018-01-02 2018-12-31 51
4 2018-01-03 2018-12-31 16
5 2018-01-03 2018-12-31 22
6 2018-01-04 2018-12-31 43
7 2018-01-04 2018-12-31 76
8 2018-01-05 2018-12-31 47
9 2018-01-05 2018-12-31 50
i would like to obtain a new dataframe which is a) grouped by date1, b) sums up the values for each date1 in the numbers column, and c) keeps the date2 value (which we can assume would be the same for each date1 or, in this case, the same for the entire dataframe
i can do the following to achieve a+b, but if i try to include something like 'date2':'mean' in the aggregation dictionary it will not work and return DataError: No numeric types to aggregate
df.groupby(['date1'],as_index=False).agg({'numbers':'sum'})
any advice?
Upvotes: 4
Views: 8001
Reputation: 862611
It seems you need if date2
is same for each group:
df.groupby(['date1', 'date2'],as_index=False).agg({'numbers':'sum'})
Or need aggregate by first
:
df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':'first'})
But if need mean
of datetime
it is a bit complicated:
df['date2'] = pd.to_datetime(df['date2'])
f = lambda x: pd.to_datetime(x.values.astype(np.int64).mean())
df1 = df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':f})
print (df1)
date1 numbers date2
0 2018-01-01 159 2018-12-31
1 2018-01-02 104 2018-12-31
2 2018-01-03 75 2018-12-31
3 2018-01-04 98 2018-12-31
4 2018-01-05 184 2018-12-31
Upvotes: 3
Reputation: 9375
Because of c):
keeps the date2 value (which we can assume would be the same for each date1 or, in this case, the same for the entire dataframe
... you can add it to the group by clause:
df.groupby(['date1', 'date2'],as_index=False).agg({'numbers':'sum'})
... giving:
date1 date2 numbers
0 2018-01-01 2018-12-31 35
1 2018-01-02 2018-12-31 52
2 2018-01-03 2018-12-31 104
3 2018-01-04 2018-12-31 96
4 2018-01-05 2018-12-31 151
Upvotes: 0