okl
okl

Reputation: 317

pandas string selection to include and exclude

I trying to include the text contains words like "AV" but to exclude any word that contains like "AV-DEFAULT"

Below code, is not getting what i want. It turn out the data with "AV-DEFAULT" are selected.

df = df[df.STRUCTURALSTATUS.str.contains('AV', regex=False, case=False, na=False) & ~ 
(df['STRUCTURALSTATUS'] != 'AV-DEFAULT')]

Upvotes: 0

Views: 2280

Answers (1)

YOLO
YOLO

Reputation: 21739

A common pattern visible in the selected string is they always start or ends with AV.
You can use .startswith / .endswith string function to check the existence of a word.

# sample data frame
df = pd.DataFrame({'names': ['AV', 'AV IE', 'AV (11)', 'AV-EE', 'AG AV','O - AV-DEFAULT']})

# create a new column 
df['new_name'] = df['names'].loc[(df.names.str.startswith('AV') | df.names.str.endswith('AV'))]

# output
    names           new_name
0   AV                 AV
1   AV IE              AV IE
2   AV (11)                AV (11)
3   AV-EE              AV-EE
4   AG AV              AG AV
5   O - AV-DEFAULT     NaN
6   HEAVY DAMAGE       NaN 

Update 1:

df['new_name'] = df.names[df.names.str.contains(r'(?!AV-DEFAULT)AV(?!\w)')]   

Regex Explanation:
1. (?!..) This says don't match the string which AV-DEFAULT
2. AV Here we match strings which has AV
3. (?!\w) Don't match AV is it is followed by any letter, like in (HEAVY DAMAGE)

Update 2: This take remove strings which starts with AV such as AVAILABLE.

df.names[df.names.str.contains(r'((?!AV-DEFAULT)AV(?!\w))|^AV')]

Update 3: How to run the code.

# sample df
df = pd.DataFrame({'names': ['AV', 'AV IE', 'AV (11)', 'AV-EE', 'AG AV','O - AV-DEFAULT','HEAVY DAMAGE','AVAILABLE(AFP)','AV-DEFAULT']}) 

# get new column
df['new_name'] = df.names[df.names.str.contains(r'((?!AV-DEFAULT)AV(?!\w))|^AVA')]

print(df)

    names           new_name
0   AV              AV
1   AV IE           AV IE
2   AV (11)         AV (11)
3   AV-EE           AV-EE
4   AG AV           AG AV
5   O - AV-DEFAULT  NaN
6   HEAVY DAMAGE    NaN
7   AVAILABLE(AFP)  AVAILABLE(AFP)
8   AV-DEFAULT      NaN

Upvotes: 1

Related Questions