user692734
user692734

Reputation: 668

Difference between re.sub and re.findall

I have strings which look like "Billboard Bill SpA". I want to have a regular expression that removes SpA, but only if there is a capitalised word before it. The regular expression I use is "[A-Z][a-z]*\s(SpA)". If I use re.sub both the SpA and the capitalised word before it get removed, which is expected.

re.sub("[A-Z][a-z]*\s(SpA)", "", "Billboard Bill SpA")
'Billboard '

However, if I use re.findall I get the functionality I need:

re.findall("[A-Z][a-z]*\s(SpA)", "Billboard Bill SpA")
['SpA']

I know I can write a pre expression with "?<=" which doesn't consume the pre text, but that works only for fixed length expressions. Anybody know what I can do to only remove "SpA" with re.sub, or make it work like re.findall?

To be more clear I want a regular expression to remove Spa, but only if there is a capitalized word before:

re.sub(regular_expresssion, "", "Billboard Bill SpA") -> Billboard Bill
re.sub(regular_expresssion, "", "to SpA") -> to SpA

Upvotes: 1

Views: 667

Answers (2)

Jason S
Jason S

Reputation: 13779

Your re.sub is replacing the entire match, not just the group (SpA). That's why it's also removing Bill. findall on the other hand is giving you the group.

In re.sub you can specify to include the part of the match that you don't want to delete.

re.sub("([A-Z][a-z]*\s)SpA", "\\1", "Billboard Bill SpA")
'Billboard Bill '

If you want to delete the space as well, move \s outside of the parentheses.

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798814

Perform the substitution using groups.

>>> re.sub("([A-Z][a-z]*\s)(SpA)", "\\1", "Billboard Bill SpA")
'Billboard Bill '

Upvotes: 1

Related Questions