Reputation: 2793
Input:
import pandas as pd
data = pd.DataFrame(data={'date':[pd.Timestamp('2016-02-15')]*3,
'time':[pd.Timedelta(x) for x in ('07:30:00','10:10:00','11:10:00')],'name':['A']*3, 'N':[1,2,3]}
).set_index(['date','time','name']).sort_index()
data = data[ data.index.get_level_values('time')>=pd.to_timedelta('09:30:00') ]
dataGB = data['N'].groupby(['date','name'])
print(data)
print('Number of groups:',len(dataGB))
print(dataGB.sum())
print(pd.__version__)
Output:
>>> print(data)
N
date time name
2016-02-15 10:10:00 A 2
11:10:00 A 3
>>> print('Number of groups:',len(dataGB))
Number of groups: 2
>>> print(dataGB.sum())
date 2
name 3
Name: N, dtype: int64
>>> print(pd.__version__)
0.24.1
Questions:
dataGB.sum()
and what to do to get expected (below) result?Expected result of dataGB.sum()
:
>>> dataGB.sum()
date name
2016-02-15 A 5
Name: N, dtype: int64
Thank you for your help!
Upvotes: 2
Views: 1806
Reputation: 153460
This may be a bug with pd.Series.groupby
, I will submit a bug report in pandas for this case.
Work Around #1 use a pd.DataFrame instead of pd.Series
data[['N']].groupby(['date','name']).sum()
Output:
N
date name
2016-02-15 A 5
Work Around #2 use the level parameter in groupby
data['N'].groupby(level=[0,2]).sum()
Output:
date name
2016-02-15 A 5
Name: N, dtype: int64
Work Around #3 use a dataframe with an aggregator column:
data.groupby(['date', 'name'])['N'].sum()
Output:
date name
2016-02-15 A 5
Name: N, dtype: int64
Upvotes: 2
Reputation: 2889
According to this post, Python Pandas - how to do group by on a multiindex, grouping on a multiindex should be done like this
dataGB = data['N'].groupby(level=[0,2])
Upvotes: 1