user106742
user106742

Reputation: 190

DataFrame.Replace(r'regex', regex=True) not working

I have a dataframe with a column with individuals' names:

name
Mr. Salmon
Mr Salmon
Ms. Salmon
Mrs. Salmon
Mrs Salmon
...

I would like to remove all the honorifics. I compiled the following regex at regex101.com and confirmed all the matches.

(^[Mm]([Rr]|[Ss]|[Xx]|[Rr][Ss]|[Ii][Ss]+)\.?\s)|(^[Mm][Ii][Ss][Tt][Ee][Rr]\.?\s)|(^[Mm][Ii][Ss]+[Uu][Ss]\.?\s)

I am using the replace method on the names dataframe to remove the regex matches with nothing. I am using the following code:

names_nohf = names.replace(r'(^[Mm]([Rr]|[Ss]|[Xx]|[Rr][Ss]|[Ii][Ss]+)\.?\s)|(^[Mm][Ii][Ss][Tt][Ee][Rr]\.?\s)|(^[Mm][Ii][Ss]+[Uu][Ss]\.?\s)', regex = True)

This, however, is not returning the desired names and is in fact making no changes at all. Could someone please point me to the right direction?

Upvotes: 1

Views: 1377

Answers (1)

furas
furas

Reputation: 142814

Use empty string as new value

import pandas as pd

data = {'X': ['Mr A', 'Mr B', 'Mr C']}

df = pd.DataFrame(data)
print(df)

df = df.replace('Mr', '', regex=True)
print(df)

Result

      X
0  Mr A
1  Mr B
2  Mr C

    X
0   A
1   B
2   C

Upvotes: 1

Related Questions