Difference between re.sub and re.findall

Question

I have strings which look like "Billboard Bill SpA". I want to have a regular expression that removes SpA, but only if there is a capitalised word before it. The regular expression I use is "[A-Z][a-z]*\s(SpA)". If I use re.sub both the SpA and the capitalised word before it get removed, which is expected.

re.sub("[A-Z][a-z]*\s(SpA)", "", "Billboard Bill SpA")
'Billboard '

However, if I use re.findall I get the functionality I need:

re.findall("[A-Z][a-z]*\s(SpA)", "Billboard Bill SpA")
['SpA']

I know I can write a pre expression with "?<=" which doesn't consume the pre text, but that works only for fixed length expressions. Anybody know what I can do to only remove "SpA" with re.sub, or make it work like re.findall?

To be more clear I want a regular expression to remove Spa, but only if there is a capitalized word before:

re.sub(regular_expresssion, "", "Billboard Bill SpA") -> Billboard Bill
re.sub(regular_expresssion, "", "to SpA") -> to SpA

Jason S · Accepted Answer

Your re.sub is replacing the entire match, not just the group (SpA). That's why it's also removing Bill. findall on the other hand is giving you the group.

In re.sub you can specify to include the part of the match that you don't want to delete.

re.sub("([A-Z][a-z]*\s)SpA", "\1", "Billboard Bill SpA")
'Billboard Bill '

If you want to delete the space as well, move \s outside of the parentheses.

Difference between re.sub and re.findall

Answers (2)

Related Questions