Eobard Thawn
Eobard Thawn

Reputation: 65

Can I remove name prefix without contaminate name data?

I have tried to remove prefix from the name. Now I use re.sup method to remove the prefix but some of the name are contain a character that included in the prefix.

Data example

-MisterClarkKent
-Mrs.Carol
-missjanedoemiss

I tried re.sub(r'(^\w{2,5}\ ?)', r'', name) to remove prefix with fix the position but it won't work because I have more than 10 prefix and each prefix has different size.

import re
name = 'mrjasontoddmr'
filter_name = re.sub(r'mr', r'', name)
print(filter_name)

#The result of filer_name is jasontodd but what I want is jasontoddmr

I expect the output of "jasontoddmr"

Upvotes: 0

Views: 209

Answers (1)

Michael Gardner
Michael Gardner

Reputation: 1813

You can specify the count and ignore the case with the provided arguments in re.sub().

import re
names = ['MisterClarkKent','Mrs.Carol','missjanedoemiss', 'mrjasontoddmr']
filter_names = [re.sub(r'mrs?\.?|mister\s?|miss\s?', r'',name, count=1, flags=re.IGNORECASE) for name in names]

filter_names

Out[99]: ['ClarkKent', 'Carol', 'janedoemiss', 'jasontoddmr']

The ? means the character is optional so in mrs?\.? bother s and . are optional so it can capture both mr or mr. and mrs or mrs..

Upvotes: 1

Related Questions