apine19
apine19

Reputation: 1

Pandas-- aggregating multiple columns with the same name?

I have multiple columns that have the same name (ie. 3 columns named "box"), and I need to have one column that sums the values so that I can concatenate my dataframes. There are multiple instances of this needing to happen.

I've tried df1 = df.groupby(level=0, axis=1).sum() but I get one column with no values. (Ie. in the first row, "box" : 9, "box" : 1, "box" : 4 should become one column named "box" with value of 14, but it's showing one column named "box" but no value.)

Upvotes: 0

Views: 414

Answers (1)

MyNameIsCaleb
MyNameIsCaleb

Reputation: 4489

You can sum based on column names using df['column_name'].sum(axis=1) and setting that for the column you want the results in.

Simple example below:

>>> df
   a  b   c  b   e
0  1  5  dd  1   6
1  2  9  ee  1  10
2  3  1  ff  1   2
>>> df['f'] = df['b'].sum(axis=1)
>>> df
   a  b   c  b   e   f
0  1  5  dd  1   6   6
1  2  9  ee  1  10  10
2  3  1  ff  1   2   2

Notice it isn't summing every value in the row, just the values from columns named 'b'.

This works because when you call a column by name and it's a duplicate, pandas recognizes that fact. For example:

>>> df['a']
0    1
1    2
2    3
Name: a, dtype: int64
>>> df['b']
   b  b
0  5  1
1  9  1
2  1  1
>>> df.columns.get_loc('a')
0
>>> df.columns.get_loc('b')
array([False,  True, False,  True, False, False])

I would recommend renaming your duplicate column names and avoiding the uncertainty that comes with having duplicate names in the future. This question has a lot of suggestions for how to do that as well as how to avoid it during import from other sources.

Upvotes: 2

Related Questions