ilam engl
ilam engl

Reputation: 1570

Way of summing all fields in rows with identical name in pandas df?

I'm banging my head against a wall: I'd like to add all values by row for all rows that have an identical value in the column 'fusion'. Here a dummy example:

import pandas as pd

rows_l = [['AFF1_KMT2A', 3.0, 1.0, 1.0, 1.0, 4, 6.0],
 ['AFF1_KMT2A', 3.0, 2.0, 1.0, 0.0, 3, 6.0],
 ['TCF3_PBX1', 3.0, 1.0, 1.0, 0.0, 3, 5.0],
 ['TCF3_PBX1', 0.0, 0.0, 0.0, 1.0, 1, 1.0],
 ['ABL1_BCR', 1.0, 1.0, 1.0, 2.0, 4, 5.0]]
col_l = ['fusion', 'FusionCatcher', 'Manta', 'STARfusion', 'TopHat','tool_count', 'tot']

my_df = pd.DataFrame(rows_l, columns=col_l)

Gives me this my_df:

fusion FusionCatcher Manta STARfusion TopHat tool_count tot
0 ABL1_BCR 1.0 1.0 1.0 2.0 4 5.0
1 AFF1_KMT2A 3.0 2.0 1.0 0.0 3 6.0
2 TCF3_PBX1 3.0 1.0 1.0 0.0 3 5.0
3 AFF1_KMT2A 3.0 1.0 1.0 1.0 4 6.0
4 TCF3_PBX1 0.0 0.0 0.0 1.0 1 1.0

Of course in my real file the identical lines are not always one line apart, but all over the place...

My output should look like this:

fusion FusionCatcher Manta STARfusion TopHat tool_count tot
0 AFF1_KMT2A 6.0 3.0 2.0 1.0 7 12.0
1 TCF3_PBX1 3.0 1.0 1.0 1.0 4 6.0
2 ABL1_BCR 1.0 1.0 1.0 2.0 4 5.0

So far I've only found df['whatever_you_want_to_sum'].groupby(df['col_names_if_same_wil_be_bundled']) but as this seems to work only for 2 columns that does not help me much.

Is there any particular python pandas way to do this or should I just continue to write my for-loops until the prints look right?

Upvotes: 1

Views: 158

Answers (1)

hbstha123
hbstha123

Reputation: 1646

Can you try using pivot tables?

pd.pivot_table(my_df, index="fusion", values=col_l, aggfunc=np.sum)

You also need to import numpy as np beforehand. You get a pivot table as shown:enter image description here

Upvotes: 1

Related Questions