Way of summing all fields in rows with identical name in pandas df?

Question

I'm banging my head against a wall: I'd like to add all values by row for all rows that have an identical value in the column 'fusion'. Here a dummy example:

import pandas as pd

rows_l = [['AFF1_KMT2A', 3.0, 1.0, 1.0, 1.0, 4, 6.0],
 ['AFF1_KMT2A', 3.0, 2.0, 1.0, 0.0, 3, 6.0],
 ['TCF3_PBX1', 3.0, 1.0, 1.0, 0.0, 3, 5.0],
 ['TCF3_PBX1', 0.0, 0.0, 0.0, 1.0, 1, 1.0],
 ['ABL1_BCR', 1.0, 1.0, 1.0, 2.0, 4, 5.0]]
col_l = ['fusion', 'FusionCatcher', 'Manta', 'STARfusion', 'TopHat','tool_count', 'tot']

my_df = pd.DataFrame(rows_l, columns=col_l)

Gives me this my_df:

	fusion	FusionCatcher	Manta	STARfusion	TopHat	tool_count	tot
0	ABL1_BCR	1.0	1.0	1.0	2.0	4	5.0
1	AFF1_KMT2A	3.0	2.0	1.0	0.0	3	6.0
2	TCF3_PBX1	3.0	1.0	1.0	0.0	3	5.0
3	AFF1_KMT2A	3.0	1.0	1.0	1.0	4	6.0
4	TCF3_PBX1	0.0	0.0	0.0	1.0	1	1.0

Of course in my real file the identical lines are not always one line apart, but all over the place...

My output should look like this:

	fusion	FusionCatcher	Manta	STARfusion	TopHat	tool_count	tot
0	AFF1_KMT2A	6.0	3.0	2.0	1.0	7	12.0
1	TCF3_PBX1	3.0	1.0	1.0	1.0	4	6.0
2	ABL1_BCR	1.0	1.0	1.0	2.0	4	5.0

So far I've only found df['whatever_you_want_to_sum'].groupby(df['col_names_if_same_wil_be_bundled']) but as this seems to work only for 2 columns that does not help me much.

Is there any particular python pandas way to do this or should I just continue to write my for-loops until the prints look right?

Way of summing all fields in rows with identical name in pandas df?

Answers (1)

Related Questions