How to remove one specific duplicate named column in columns of a dataframe?

Question

I have a sample dataframe df with columns as:

  a b c a a b b c c 
0 2 2 1 2 2 1 1 2 2
1 2 2 2 2 2 1 2 1 2
. . .
. . .

I want to remove the duplicate columns named with only 'a' and keep other as same The expected o/p is:

  a b c b b c c 
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2

Stef · Accepted Answer

Here is a general solution to drop any duplicates of a column, no matter where these columns are in the dataframe and what the content of these columns is.
First we get all column indexes for the given column name and drop the first occurrence. Then we "substract" these indexes from all indexes and return the remaining columns:

to_drop = 'a'
dup = [i for i,v in enumerate(df.columns) if v==to_drop][1:]
df = df.iloc[:, list(set(range(len(df.columns))) - set(dup))]

Result:

   a  b  c  b  b  c  c
0  2  2  1  1  1  2  2
1  2  2  2  1  2  1  2

How to remove one specific duplicate named column in columns of a dataframe?

Answers (2)

Related Questions