Reputation: 59
Panadas - sum of each column based on group by first column
I have this text file which has Table and other 3 other columns indicating Select, Update and Insert. I would like to do group by table and sum of each column and grand total at the end.
df=data.groupby(['Table'])
print df.groupby(['Table'])["Select","Update","Insert"].agg('sum')
Text file has data in this format
Table Select Update Insert
A 10 8 5
B 12 2 0
C 10 2 4
B 19 3 1
D 13 0 5
A 11 7 3
Expected output
Table Select Update Insert
A 21 15 8
B 31 5 1
C 10 2 4
D 13 0 5
Total 75 22 18
df.groupby with sum isn't aggregating data properly for every column. If aggregation is done only on one column then it is good but output on my terminal is all messed up.
Appreciate your help!
Upvotes: 1
Views: 3528
Reputation: 59
It appears that when loading data from .log file data isn't framed correctly for pandas' to process
This is how the data is being loaded
df=pd.DataFrame(data)
print df
Output of frame I get,
Table ... Insert
0 Table Select Update Insert ... NaN
1 A 10 8 5 ... NaN
2 B 12 2 0 ... NaN
3 C 10 2 4 ... NaN
4 B 19 3 1 ... NaN
5 D 13 0 5 ... NaN
6 A 11 7 3 ... NaN
versus
when I load in data frame using below,
data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}
output of print df is
{'Table': ['A', 'B', 'C', 'B', 'D', 'A'], 'Update': [8, 2, 2, 3, 0, 7], 'Select': [10, 12, 10, 19, 13, 11], 'Insert': [5, 0, 4, 1, 5, 3]}
and pivot_table provides the output as expected.
jitesh singla: If you don't mind, can you please provide details on how pivot_table is working with group by on Table column and aggregating data for other columns.
Upvotes: 0
Reputation: 304
you can try using pandas "pivot_table" function with margins =True
data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}
df =pd.DataFrame(data)
df2 =df.pivot_table(index ='Table',
margins=True,
margins_name='Total', # defaults to 'All'
aggfunc=sum)
df2.reset_index(inplace =True)
df2[['Table','Select','Update','Insert']]
And you will get the required output :
Table Select Update Insert
0 A 21 15 8
1 B 31 5 1
2 C 10 2 4
3 D 13 0 5
4 Total 75 22 18
Hope this helps!
Upvotes: 1
Reputation: 59
Table ...
A 10 8 5 0.0 ... 0.0
A 11 7 3 0.0 ... 0.0
B 12 2 0 0.0 ... 0.0
B 19 3 1 0.0 ... 0.0
C 10 2 4 0.0 ... 0.0
D 13 0 5 0.0 ... 0.0
Table Select Update Insert 0.0 ... 0.0
[7 rows x 3 columns]
This is the output I get with df.groupby(by='Table').sum()
Upvotes: 0
Reputation: 4827
You can try: df.groupby(by='Table').sum()
for the aggregate table:
Select Update Insert
Table
A 21 15 8
B 31 5 1
C 10 2 4
D 13 0 5
And df.groupby(by='Table').sum().sum()
for the totals:
Select 75
Update 22
Insert 18
dtype: int64
Upvotes: 2