Reputation: 1061
I am trying to group by one/two columns, Sum the value of forth column and find mean in fifth column. Every operation to be written in separate output. Feels a bit tricky to make even a start.
Input: No header, has more than 100k rows
StartTime, EndTime,Day,SumCount,UniqueCount
00:00:00,01:00:00,Mon,13534,594
01:00:00,02:00:00,Mon,16674,626
02:00:00,03:00:00,Mon,23736,671
03:00:00,04:00:00,Mon,16977,671
00:00:00,01:00:00,Tue,17262,747
01:00:00,02:00:00,Tue,19072,777
02:00:00,03:00:00,Tue,18275,785
03:00:00,04:00:00,Tue,13589,757
04:00:00,05:00:00,Tue,16053,735
05:00:00,06:00:00,Tue,11440,636
I am trying to find is
StartTime
& EndTime
find Sum of SumCount
and Median of UniqueCount
Day
find Sum of SumCount
and Median of UniqueCount
I am lost how to put the two types of groupby in one program and get two different output.
Upvotes: 1
Views: 580
Reputation: 294338
Try:
df.groupby(['StartTime', 'EndTime']).agg({'SumCount': ['sum'],
'UniqueCount': {'median': lambda x: np.median(x).round(0)}})
Or:
df.groupby(['Day']).agg({'SumCount': ['sum'],
'UniqueCount': {'median': lambda x: np.median(x).round(0)}})
Upvotes: 1