Pat
Pat

Reputation: 59

Pandas dataframe - sum of each column based on group

Panadas - sum of each column based on group by first column

I have this text file which has Table and other 3 other columns indicating Select, Update and Insert. I would like to do group by table and sum of each column and grand total at the end.

df=data.groupby(['Table'])
print df.groupby(['Table'])["Select","Update","Insert"].agg('sum')

Text file has data in this format
Table Select Update Insert
A        10      8      5
B        12      2      0
C        10      2      4
B        19      3      1
D        13      0      5
A        11      7      3

Expected output
Table Select Update Insert
A        21      15     8
B        31      5      1
C        10      2      4
D        13      0      5
Total    75      22    18

df.groupby with sum isn't aggregating data properly for every column. If aggregation is done only on one column then it is good but output on my terminal is all messed up.

Appreciate your help!

Upvotes: 1

Views: 3528

Answers (4)

Pat
Pat

Reputation: 59

It appears that when loading data from .log file data isn't framed correctly for pandas' to process

This is how the data is being loaded


df=pd.DataFrame(data)
print df

Output of frame I get,

                        Table  ...  Insert
0  Table Select Update Insert  ...     NaN
1   A        10      8      5  ...     NaN
2   B        12      2      0  ...     NaN
3   C        10      2      4  ...     NaN
4   B        19      3      1  ...     NaN
5   D        13      0      5  ...     NaN
6   A        11      7      3  ...     NaN

versus  
when I load in data frame using below,
data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}

output of print df is 
{'Table': ['A', 'B', 'C', 'B', 'D', 'A'], 'Update': [8, 2, 2, 3, 0, 7], 'Select': [10, 12, 10, 19, 13, 11], 'Insert': [5, 0, 4, 1, 5, 3]}

and pivot_table provides the output as expected.

jitesh singla: If you don't mind, can you please provide details on how pivot_table is working with group by on Table column and aggregating data for other columns.

Upvotes: 0

jitesh2796
jitesh2796

Reputation: 304

you can try using pandas "pivot_table" function with margins =True

data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}

df =pd.DataFrame(data)

df2 =df.pivot_table(index ='Table',
               margins=True,
               margins_name='Total', # defaults to 'All'
               aggfunc=sum)

df2.reset_index(inplace =True)

df2[['Table','Select','Update','Insert']]

And you will get the required output :

   Table  Select  Update  Insert
0      A      21      15       8
1      B      31       5       1
2      C      10       2       4
3      D      13       0       5
4  Total      75      22      18

Hope this helps!

Upvotes: 1

Pat
Pat

Reputation: 59

Table                               ...        
A        10      8      5      0.0  ...     0.0
A        11      7      3      0.0  ...     0.0
B        12      2      0      0.0  ...     0.0
B        19      3      1      0.0  ...     0.0
C        10      2      4      0.0  ...     0.0
D        13      0      5      0.0  ...     0.0
Table Select Update Insert     0.0  ...     0.0

[7 rows x 3 columns]

This is the output I get with df.groupby(by='Table').sum()

Upvotes: 0

René
René

Reputation: 4827

You can try: df.groupby(by='Table').sum() for the aggregate table:

       Select  Update  Insert
Table                        
A          21      15       8
B          31       5       1
C          10       2       4
D          13       0       5

And df.groupby(by='Table').sum().sum() for the totals:

Select    75
Update    22
Insert    18
dtype: int64

Upvotes: 2

Related Questions