Shraddha Avasthy
Shraddha Avasthy

Reputation: 162

Remove duplicate column indexes from pandas dataframe

I am looking for a solution to removing duplicate column indexes in my dataframe- what I need to do is to add the values in the duplicate columns row by row and then keep only 1 of these columns with the summed value

df = pd.DataFrame(np.array([[0,0,0,1,0,0,0], [0,1,0,0,0,0,0],
                        [0,0,0,0,0,0,1]]), columns=[1,1,2,2,2,3,3], index=[1,2,3])

   1  1  2  2  2  3  3
1  0  0  0  1  0  0  0
2  0  1  0  0  0  0  0
3  0  0  0  0  0  0  1

should become

   1  2  3
1  0  1  0
2  1  0  0
3  0  0  1

Upvotes: 4

Views: 1416

Answers (4)

BENY
BENY

Reputation: 323226

you do not need groupby here

df.sum(level=0,axis=1)
Out[358]: 
   1  2  3
1  0  1  0
2  1  0  0
3  0  0  1

Upvotes: 2

user3483203
user3483203

Reputation: 51165

Simply group by columns:

df.groupby(df.columns, 1).sum()

   1  2  3
1  0  1  0
2  1  0  0
3  0  0  1

Or as pointed out by @user2285236

df.groupby(axis=1, level=0).sum()

Upvotes: 1

Sheldore
Sheldore

Reputation: 39042

Due to the missing data, an ugly attempt to your problem:

import pandas as pd
df = pd.DataFrame(np.array([[0,0,0,1,0,0,0], [0,1,0,0,0,0,0], 
                            [0,0,0,0,0,0,1]]))
df.columns = [1,1,2,2,2,3,3]
df1 = df.groupby(lambda x:x, axis=1).sum()
df1.index = range(1,4)
df1

outputs the desired dataframe you posted. The following df1.index = range(1,4) is just to re-index the rows because they start with 1 in your example.

Upvotes: 1

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15568

Have you tried?

df = df.loc[:,~df.columns.duplicated(keep='last')]

Upvotes: 0

Related Questions