KcH
KcH

Reputation: 3502

How to remove one specific duplicate named column in columns of a dataframe?

I have a sample dataframe df with columns as:

  a b c a a b b c c 
0 2 2 1 2 2 1 1 2 2
1 2 2 2 2 2 1 2 1 2
. . .
. . . 

I want to remove the duplicate columns named with only 'a' and keep other as same The expected o/p is:

  a b c b b c c 
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2

Upvotes: 2

Views: 69

Answers (2)

Stef
Stef

Reputation: 30679

Here is a general solution to drop any duplicates of a column, no matter where these columns are in the dataframe and what the content of these columns is.
First we get all column indexes for the given column name and drop the first occurrence. Then we "substract" these indexes from all indexes and return the remaining columns:

to_drop = 'a'
dup = [i for i,v in enumerate(df.columns) if v==to_drop][1:]
df = df.iloc[:, list(set(range(len(df.columns))) - set(dup))]

Result:

   a  b  c  b  b  c  c
0  2  2  1  1  1  2  2
1  2  2  2  1  2  1  2

Upvotes: 3

iamklaus
iamklaus

Reputation: 3770

df = df.T.reset_index().drop_duplicates().set_index('index').T
del df.columns.name

Exp

since the column a has only dupe values, so we can simply transpose with reset index

df.T.reset_index()


  index  0  1
0     a  2  2
1     b  2  2
2     c  1  2
3     b  1  1
4     b  1  2
5     c  2  1
6     c  2  2

Apply drop_duplicate on above df and only the dupes will get removed. It serves the purpose in those instances too where there are more than one column which has dupe value

Output

   a  b  c  b  b  c  c
0  2  2  1  1  1  2  2
1  2  2  2  1  2  1  2

Upvotes: 2

Related Questions