Reputation: 3502
I have a sample dataframe df with columns as:
a b c a a b b c c
0 2 2 1 2 2 1 1 2 2
1 2 2 2 2 2 1 2 1 2
. . .
. . .
I want to remove the duplicate columns named with only 'a' and keep other as same The expected o/p is:
a b c b b c c
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2
Upvotes: 2
Views: 69
Reputation: 30679
Here is a general solution to drop any duplicates of a column, no matter where these columns are in the dataframe and what the content of these columns is.
First we get all column indexes for the given column name and drop the first occurrence. Then we "substract" these indexes from all indexes and return the remaining columns:
to_drop = 'a'
dup = [i for i,v in enumerate(df.columns) if v==to_drop][1:]
df = df.iloc[:, list(set(range(len(df.columns))) - set(dup))]
Result:
a b c b b c c
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2
Upvotes: 3
Reputation: 3770
df = df.T.reset_index().drop_duplicates().set_index('index').T
del df.columns.name
Exp
since the column a has only dupe values, so we can simply transpose with reset index
df.T.reset_index()
index 0 1
0 a 2 2
1 b 2 2
2 c 1 2
3 b 1 1
4 b 1 2
5 c 2 1
6 c 2 2
Apply drop_duplicate on above df and only the dupes will get removed. It serves the purpose in those instances too where there are more than one column which has dupe value
Output
a b c b b c c
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2
Upvotes: 2