Reputation: 958
I have data that looks like
Name,Report_ID,Amount,Flag,Actions
Fizz,123,5,,A
Fizz,123,10,Y,A
Buzz,456,10,,B
Buzz,456,40,,C
Buzz,456,70,,D
Bazz,678,100,Y,F
From these individual operations, i'd like to create a new dataframe that captures various statistics / meta name. Mostly summations and counts of items / counts of unique entries. I'd like the output of the dataframe to look like the following:
Report_ID,Number of Flags,Number of Entries, Total,Unique Actions
123,1,2,15,1
456,0,3,120,3
678,1,1,100,1
I've tried using groupby
, but I cannot merge all of the individual groupby objects back together correctly. So far I've tried
totals = raw_data.groupby('Report_ID')['Amount'].sum()
event_count = raw_data.groupby('Report_ID').size()
num_actions = raw_data.groupby('Report_ID').Actions.nunique()
output = pd.concat([totals,event_count,num_actions])
When I try this i get TypeError: cannot concatenate a non-NDFrame object
. Any help would be appreciated!
Upvotes: 0
Views: 53
Reputation: 109546
You just need to specify axis=1
when concatenating:
event_count.name = 'Event Count' # Name the Series, as you did not group on one.
>>> pd.concat([totals, event_count, num_actions], axis=1)
Amount Event Count Actions
Report_ID
123 15 2 1
456 120 3 3
678 100 1 1
Upvotes: 0
Reputation: 294298
You can use agg
on the groupby
f = dict(Flag=['count', 'size'], Amount='sum', Actions='nunique')
df.groupby('Report_ID').agg(f)
Flag Amount Actions
count size sum nunique
Report_ID
123 1 2 15 1
456 0 3 120 3
678 1 1 100 1
Upvotes: 1