Frank Epps
Frank Epps

Reputation: 580

Re.sub in python not working

Suppose that:

s = '<A HREF="http://www.google.com" ID="test">blah</A>'

I want to prepend the url with "url: ", so I tried:

s = re.sub(r'href="([\w:/.]+)"', "url: " + r'\1', s, re.I)

but this does not change s.

Upvotes: 2

Views: 878

Answers (2)

glglgl
glglgl

Reputation: 91017

While the other answer is technically absolutely correct, I don't think you want that what is mentionned there.

Instead, you might want to work with a match object:

m = re.search(r'href="([\w:/.]+)"', s, re.I)
print m.expand(r"url: \1")

which results to

url: http://google.com

without the <A before and the ID="test">blah</A> behind.

(If you want to do more of these replacements, you might even want to reuse the regex by compiling it:

r = re.compile(r'href="([\w:/.]+)"', re.I)
ex = lambda st: r.search(st).expand(r"url: \1")
print ex('<A HREF="http://www.google.com" ID="test">blah</A>')
print ex('<A HREF="http://www.yahoo.com" ID="test">blah</A>')
# and so on.

If, however, you indeed want to keep the HTML around it, you'll have to work with lookahead and lookbehind expressions:

re.sub(r'(?<=href=")([\w:/.]+)(?=")', "url: " + r'\1', s, flags=re.I)
# -> '<A HREF="url: http://www.google.com" ID="test">blah</A>'

or simply by repeating the omitted stuff:

re.sub(r'href="([\w:/.]+)"', r'href="url: \1"', s, flags=re.I)
# -> '<A href="url: http://www.google.com" ID="test">blah</A>'

Upvotes: 2

NPE
NPE

Reputation: 500257

The re.I is in the wrong position (it's being interpreted as the count argument).

From the documentation:

re.sub(pattern, repl, string, count=0, flags=0)
                              ^^^^^    ^^^^^

Try:

In [27]: re.sub(r'href="([\w:/.]+)"', "url: " + r'\1', s, flags=re.I)
Out[27]: '<A url: http://www.google.com ID="test">blah</A>'

Upvotes: 4

Related Questions