Reputation: 558
I have a series of strings which are identifiable by finding a substring "p" tag followed by at least two CAPITAL letters.
Input:
<p>JIM <p>SALLY <p>ROBERT <p>Eric
I want to change the "p" tag to an "i" tag if it's followed by those two capital letters (so not the last one, 'Eric').
Desired output:
<i>JIM <i>SALLY <i>ROBERT <p>Eric
I've tried this using regular expressions in Python:
import re
Mytext = "<p>JIM <p>SALLY <p>ROBERT <p>Eric"
changeTags = re.sub('<p>[A-Z]{2}', '<i>' + re.search('<p>[A-Z]{2}', Mytext).group()[-2:], Mytext)
print changeTags
But the output uses "i" tag + JI in every instance, rather than interating through to use SA and then RO in entries 2 and 3.
<i>JIM <i>JILLY <i>JIBERT <p>Eric
I believe the problem is that I don't understand the .group() method properly. Can anyone advise what I've done wrong?
Thank you.
Upvotes: 1
Views: 425
Reputation: 14945
Another way using look-ahead assertion:
re.sub(r'<p>(?=[A-Z]{2,})','<i>',MyText)
Upvotes: 1
Reputation: 64308
Your inner re.search
is only evaluted once, and the result is passed as one of the parameters to re.sub
. This can't possible capture all the capital-letters-pairs, only the first one. This means your approach cannot work, not merely your understanding of groups
.
Furthermore, using groups
is unnecessary.
You need to capture the capital letters using parenthesis, and reference it as \1
in the substitution expression:
re.sub('<p>([A-Z]{2})', r'<i>\1', Mytext)
\1
here means: replace with the substring matched by the first (...)
in the regular expression. (docs)
Note the leading r
in front of the substitution string, to make it raw.
Upvotes: 0