Reputation: 1357
I have a list of suffixes I want to remove in a list, say suffixes = ['inc','co','ltd']
.
I want to remove these from a column in a Pandas dataframe, and I have been doing this:
df['name'] = df['name'].str.replace('|'.join(suffixes), '')
.
This works, but I do NOT want to remove the suffice if what remains is numeric. For example, if the name is 123 inc
, I don't want to strip the 'inc'. Is there a way to add this condition in the code?
Upvotes: 1
Views: 55
Reputation: 82765
Using Regex --> negative lookbehind
.
Ex:
suffixes = ['inc','co','ltd']
df = pd.DataFrame({"Col": ["Abc inc", "123 inc", "Abc co", "123 co"]})
df['Col_2'] = df['Col'].str.replace(r"(?<!\d) \b(" + '|'.join(suffixes) + r")\b", '', regex=True)
print(df)
Output:
Col Col_2
0 Abc inc Abc
1 123 inc 123 inc
2 Abc co Abc
3 123 co 123 co
Upvotes: 2
Reputation: 7591
Try adding ^[^0-9]+
to the suffixes. It is a REGEX that literally means "at least one not numeric char before". The code would look like this:
non_numeric_regex = r"^[^0-9]+"
suffixes = ['inc','co','ltd']
regex_w_suffixes = [non_numeric_regex + suf for suf in suffixes]
df['name'] = df['name'].str.replace('|'.join(regex_w_suffixes ), '')
Upvotes: 1