Reputation: 633
I have an excel file that looks like this:
A B C D E F G
run_1_clust_1.out: GLU 2 HN 2.07 3851 -0.90
GLY 1 HN 2.09 3196 -0.90
3 HN 2.05 3553 -0.90
HT1 2.12 2828 -0.91
HT2 2.05 3516 -0.90
run_1_clust_2.out: GLU 2 HN 2.12 1940 -0.90
GLY 1 HN 2.33 4030 -0.90
3 HN 2.43 3994 -0.90
HT1 2.11 2833 -0.91
HT2 2.05 3242 -0.90
I would like to group the columns E, F and G by columns B, C and D. For an output that looks like this:
run_1_clust_1.out: GLY 1 HN 2.09 3196 -0.90
run_1_clust_2.out: GLY 1 HN 2.33 4030 -0.90
run_1_clust_1.out: GLU 2 HN 2.07 3851 -0.90
run_1_clust_2.out: GLU 2 HN 2.12 1940 -0.90
run_1_clust_1.out: GLY 3 HN 2.05 3553 -0.90
run_1_clust_2.out: GLY 3 HN 2.43 3994 -0.90
run_1_clust_1.out: GLY 3 HT1 2.12 2828 -0.91
....
I'm using pandas, but I'm not sure why I get the AttributeError telling me to use the 'apply' method.
import pandas as pd
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df.columns = df[['a','b','c','d','e','f','g']]
df = df.groupby(['b','c','d'])
df.to_excel(writer, sheet_name="Sheet1")
writer.save()
Upvotes: 1
Views: 109
Reputation: 164673
Try this. The main differences are: I've specified a calculation to perform on grouping and reset index so that the output is a dataframe.
import pandas as pd
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df.columns = df[['a','b','c','d','e','f','g']]
group_cols = ['b','c','d']
sum_cols = ['e', 'f', 'g']
df = df[group_cols+sum_cols].groupby(group_cols).sum().reset_index()
df.to_excel(writer, sheet_name="Sheet1")
writer.save()
Upvotes: 1