Reputation: 1
I have multiple columns that have the same name (ie. 3 columns named "box"), and I need to have one column that sums the values so that I can concatenate my dataframes. There are multiple instances of this needing to happen.
I've tried df1 = df.groupby(level=0, axis=1).sum()
but I get one column with no values. (Ie. in the first row, "box" : 9, "box" : 1, "box" : 4 should become one column named "box" with value of 14, but it's showing one column named "box" but no value.)
Upvotes: 0
Views: 414
Reputation: 4489
You can sum based on column names using df['column_name'].sum(axis=1)
and setting that for the column you want the results in.
Simple example below:
>>> df
a b c b e
0 1 5 dd 1 6
1 2 9 ee 1 10
2 3 1 ff 1 2
>>> df['f'] = df['b'].sum(axis=1)
>>> df
a b c b e f
0 1 5 dd 1 6 6
1 2 9 ee 1 10 10
2 3 1 ff 1 2 2
Notice it isn't summing every value in the row, just the values from columns named 'b'.
This works because when you call a column by name and it's a duplicate, pandas
recognizes that fact. For example:
>>> df['a']
0 1
1 2
2 3
Name: a, dtype: int64
>>> df['b']
b b
0 5 1
1 9 1
2 1 1
>>> df.columns.get_loc('a')
0
>>> df.columns.get_loc('b')
array([False, True, False, True, False, False])
I would recommend renaming your duplicate column names and avoiding the uncertainty that comes with having duplicate names in the future. This question has a lot of suggestions for how to do that as well as how to avoid it during import from other sources.
Upvotes: 2