Reputation: 273
Consider the following Dataframe
:
AAA3 ABB3 DAT4 DEE3 ABB4 AAA4 DAA3 EAV3 DAC4 DEE4
01/01/2020 1 1.1 1.5 1.2 1.32 1.2 1.2 1 0.9 0.5
02/01/2020 1 1.1 1.5 1.2 1.32 1.2 1.2 1 0.9 0.5
03/01/2020 1 1.1 1.5 1.2 1.32 1.2 1.2 1 0.9 0.5
04/01/2020 1 1.1 1.5 1.2 1.32 1.2 1.2 1 0.9 0.5
The values are not important, so I am giving all columns the same value.
What I want to do is see if the alphabets characters part of my columns
headers
, has an match among the headers, and if it does, remove the header that has an 4, leaving only the name that has a 3.
For example:
There is an AAA3
, as well as an AAA4
. I want to drop the AAA4
column, leaving only AAA3
.
Note that there is a column named DAC4
, but there isn't a DAC3
. So I want to keep my DAC4
column.
I couldn't solve my problem with the following question : Select by partial string from a pandas DataFrame
Upvotes: 2
Views: 77
Reputation: 25249
Create a mask on duplicated of alphabet part. Create another mask where the last char is 3
. Finally slicing using these masks
m = df.columns.str.extract(r'(^[A-Za-z]+)').duplicated(keep=False)
m1 = df.columns.str.endswith('3')
df_final = df.loc[:,(~m | m1).values]
Out[146]:
AAA3 ABB3 DAT4 DEE3 DAA3 EAV3 DAC4
01/01/2020 1 1.1 1.5 1.2 1.2 1 0.9
02/01/2020 1 1.1 1.5 1.2 1.2 1 0.9
03/01/2020 1 1.1 1.5 1.2 1.2 1 0.9
04/01/2020 1 1.1 1.5 1.2 1.2 1 0.9
Upvotes: 2
Reputation: 28699
Step 1: Get a dictionary of similar columns :
from collections import defaultdict
from itertools import chain
d = defaultdict(list)
for entry in df.columns:
d[entry[:-1]].append(entry)
d
defaultdict(list,
{'AAA': ['AAA3', 'AAA4'],
'ABB': ['ABB3', 'ABB4'],
'DAT': ['DAT4'],
'DEE': ['DEE3', 'DEE4'],
'DAA': ['DAA3'],
'EAV': ['EAV3'],
'DAC': ['DAC4']})
Step 2: Get columns that end with 4 :
from itertools import chain
cols_to_drop = list(chain.from_iterable([[ent for ent in value
if ent.endswith("4")]
for key,value in d.items()
if len(value) > 1]))
cols_to_drop
['AAA4', 'ABB4', 'DEE4']
Step 3: Drop columns :
df.drop(columns=cols_to_drop)
AAA3 ABB3 DAT4 DEE3 DAA3 EAV3 DAC4
0 01/01/2020 1 1.1 1.5 1.2 1 0.9
1 02/01/2020 1 1.1 1.5 1.2 1 0.9
2 03/01/2020 1 1.1 1.5 1.2 1 0.9
3 04/01/2020 1 1.1 1.5 1.2 1 0.9
Upvotes: 2