Reputation: 81
Is there a way to drop specific duplicate columns just based on their column names. Values don't matter. Following table has columns with names A, B, C, D which are repetitive. I just want to get rid of the occurrences of column named A except the first one.
A B C D A B C D
18 75 92 24 88 75 49 7
30 56 42 58 60 72 36 47
85 84 28 84 68 49 71 9
71 70 97 99 63 71 1 14
82 63 67 23 74 38 74 37
87 11 47 69 52 24 51 50
46 73 85 55 69 15 3 26
47 100 6 42 26 76 88 54
81 100 79 59 4 19 5 89
93 67 83 51 66 52 52 84
29 26 58 54 31 30 52 17
48 29 7 79 75 90 13 95
77 40 16 30 41 92 98 68
56 11 39 45 65 69 4 75
93 83 98 83 9 21 66 89
2 10 4 29 44 95 53 67
The end result should look something like this
A B C D B C D
18 75 92 24 75 49 7
30 56 42 58 72 36 47
85 84 28 84 49 71 9
71 70 97 99 71 1 14
82 63 67 23 38 74 37
87 11 47 69 24 51 50
46 73 85 55 15 3 26
47 100 6 42 76 88 54
81 100 79 59 19 5 89
93 67 83 51 52 52 84
29 26 58 54 30 52 17
48 29 7 79 90 13 95
77 40 16 30 92 98 68
56 11 39 45 69 4 75
93 83 98 83 21 66 89
2 10 4 29 95 53 67
I have tried using this which deletes all the duplicate columns found and not just A.
df = df.loc[:,~df.columns.duplicated()]
Upvotes: 1
Views: 117
Reputation: 25269
Use iloc
with get_indexer_for
a = df.columns.get_indexer_for(['A'])[1:]
df_final = df.iloc[:, [c for c in range(df.shape[1]) if c not in a]]
Out[1131]:
A B C D B C D
0 18 75 92 24 75 49 7
1 30 56 42 58 72 36 47
2 85 84 28 84 49 71 9
3 71 70 97 99 71 1 14
4 82 63 67 23 38 74 37
5 87 11 47 69 24 51 50
6 46 73 85 55 15 3 26
7 47 100 6 42 76 88 54
8 81 100 79 59 19 5 89
9 93 67 83 51 52 52 84
10 29 26 58 54 30 52 17
11 48 29 7 79 90 13 95
12 77 40 16 30 92 98 68
13 56 11 39 45 69 4 75
14 93 83 98 83 21 66 89
15 2 10 4 29 95 53 67
Upvotes: 1
Reputation: 1767
This should work:
# Mark all columns except `A`s as True
cols = ~(df.columns == 'A')
# Add the first occurrence of `A`:
cols[list(df.columns).index('A')] = True
# Here you get `df` only with first occurrence of `A`:
df.loc[:,cols]
Upvotes: 3
Reputation: 323396
Let us try cumcount
s = df.columns.to_series()
out = df.loc[:,~((s.groupby(s).cumcount()==1) & (s =='A'))]
Out[113]:
A B C D B C D
0 18 75 92 24 75 49 7
1 30 56 42 58 72 36 47
2 85 84 28 84 49 71 9
3 71 70 97 99 71 1 14
4 82 63 67 23 38 74 37
5 87 11 47 69 24 51 50
6 46 73 85 55 15 3 26
7 47 100 6 42 76 88 54
8 81 100 79 59 19 5 89
9 93 67 83 51 52 52 84
10 29 26 58 54 30 52 17
11 48 29 7 79 90 13 95
12 77 40 16 30 92 98 68
13 56 11 39 45 69 4 75
14 93 83 98 83 21 66 89
15 2 10 4 29 95 53 67
Upvotes: 2