Reputation: 580
Suppose that:
s = '<A HREF="http://www.google.com" ID="test">blah</A>'
I want to prepend the url with "url: ", so I tried:
s = re.sub(r'href="([\w:/.]+)"', "url: " + r'\1', s, re.I)
but this does not change s
.
Upvotes: 2
Views: 878
Reputation: 91017
While the other answer is technically absolutely correct, I don't think you want that what is mentionned there.
Instead, you might want to work with a match object:
m = re.search(r'href="([\w:/.]+)"', s, re.I)
print m.expand(r"url: \1")
which results to
url: http://google.com
without the <A
before and the ID="test">blah</A>
behind.
(If you want to do more of these replacements, you might even want to reuse the regex by compiling it:
r = re.compile(r'href="([\w:/.]+)"', re.I)
ex = lambda st: r.search(st).expand(r"url: \1")
print ex('<A HREF="http://www.google.com" ID="test">blah</A>')
print ex('<A HREF="http://www.yahoo.com" ID="test">blah</A>')
# and so on.
If, however, you indeed want to keep the HTML around it, you'll have to work with lookahead and lookbehind expressions:
re.sub(r'(?<=href=")([\w:/.]+)(?=")', "url: " + r'\1', s, flags=re.I)
# -> '<A HREF="url: http://www.google.com" ID="test">blah</A>'
or simply by repeating the omitted stuff:
re.sub(r'href="([\w:/.]+)"', r'href="url: \1"', s, flags=re.I)
# -> '<A href="url: http://www.google.com" ID="test">blah</A>'
Upvotes: 2
Reputation: 500257
The re.I
is in the wrong position (it's being interpreted as the count
argument).
From the documentation:
re.sub(pattern, repl, string, count=0, flags=0)
^^^^^ ^^^^^
Try:
In [27]: re.sub(r'href="([\w:/.]+)"', "url: " + r'\1', s, flags=re.I)
Out[27]: '<A url: http://www.google.com ID="test">blah</A>'
Upvotes: 4