EA00
EA00

Reputation: 633

Pandas, use apply method?

I have an excel file that looks like this:

 A                           B  C   D    E       F        G

run_1_clust_1.out:          GLU 2   HN  2.07    3851    -0.90   
                            GLY 1   HN  2.09    3196    -0.90
                                3   HN  2.05    3553    -0.90   
                                    HT1 2.12    2828    -0.91   
                                    HT2 2.05    3516    -0.90 
run_1_clust_2.out:          GLU 2   HN  2.12    1940    -0.90   
                            GLY 1   HN  2.33    4030    -0.90
                                3   HN  2.43    3994    -0.90   
                                    HT1 2.11    2833    -0.91   
                                    HT2 2.05    3242    -0.90

I would like to group the columns E, F and G by columns B, C and D. For an output that looks like this:

 run_1_clust_1.out:          GLY 1  HN  2.09    3196    -0.90
 run_1_clust_2.out:          GLY 1  HN  2.33    4030    -0.90     
 run_1_clust_1.out:          GLU 2  HN  2.07    3851    -0.90 
 run_1_clust_2.out:          GLU 2  HN  2.12    1940    -0.90
 run_1_clust_1.out:          GLY 3  HN  2.05    3553    -0.90
 run_1_clust_2.out:          GLY 3  HN  2.43    3994    -0.90
 run_1_clust_1.out:          GLY 3  HT1 2.12    2828    -0.91
 ....

I'm using pandas, but I'm not sure why I get the AttributeError telling me to use the 'apply' method.

import pandas as pd

writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')

xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df.columns = df[['a','b','c','d','e','f','g']]
df = df.groupby(['b','c','d'])
df.to_excel(writer, sheet_name="Sheet1")

writer.save()    

Upvotes: 1

Views: 109

Answers (1)

jpp
jpp

Reputation: 164673

Try this. The main differences are: I've specified a calculation to perform on grouping and reset index so that the output is a dataframe.

import pandas as pd

writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')

xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df.columns = df[['a','b','c','d','e','f','g']]

group_cols = ['b','c','d']
sum_cols = ['e', 'f', 'g']

df = df[group_cols+sum_cols].groupby(group_cols).sum().reset_index()
df.to_excel(writer, sheet_name="Sheet1")

writer.save()    

Upvotes: 1

Related Questions