formicaman
formicaman

Reputation: 1357

Pandas - Replace substrings from a column if not numeric

I have a list of suffixes I want to remove in a list, say suffixes = ['inc','co','ltd']. I want to remove these from a column in a Pandas dataframe, and I have been doing this: df['name'] = df['name'].str.replace('|'.join(suffixes), '').

This works, but I do NOT want to remove the suffice if what remains is numeric. For example, if the name is 123 inc, I don't want to strip the 'inc'. Is there a way to add this condition in the code?

Upvotes: 1

Views: 55

Answers (2)

Rakesh
Rakesh

Reputation: 82765

Using Regex --> negative lookbehind.

Ex:

suffixes = ['inc','co','ltd']

df = pd.DataFrame({"Col": ["Abc inc", "123 inc", "Abc co", "123 co"]})
df['Col_2'] = df['Col'].str.replace(r"(?<!\d) \b(" + '|'.join(suffixes) + r")\b", '', regex=True)
print(df)

Output:

       Col    Col_2
0  Abc inc      Abc
1  123 inc  123 inc
2   Abc co      Abc
3   123 co   123 co

Upvotes: 2

ibarrond
ibarrond

Reputation: 7591

Try adding ^[^0-9]+ to the suffixes. It is a REGEX that literally means "at least one not numeric char before". The code would look like this:

non_numeric_regex = r"^[^0-9]+"
suffixes = ['inc','co','ltd']
regex_w_suffixes = [non_numeric_regex + suf for suf in suffixes]
df['name'] = df['name'].str.replace('|'.join(regex_w_suffixes ), '')

Upvotes: 1

Related Questions