KameeCoding
KameeCoding

Reputation: 723

Is there anyway to make changes to string in the capture group in re.sub?

So I replace the the link with the text of the link

text = re.sub('<a href=\".*?\">(.*?)</a>','\\1',text)

example:

>>>text="<a href="SOME URL">SOME URL</a>"
>>>text = re.sub('<a href=\".*?\">(.*?)</a>','\\1',text)
>>>print text
SOME URL

I would like it to output some_url

but adding .lower().replace(' ','_') doesn't help

>>>text = re.sub('<a href=\".*?\">(.*?)</a>','\\1'.lower().replace(' ','_'),text)
SOME URL

Upvotes: 0

Views: 43

Answers (2)

enthus1ast
enthus1ast

Reputation: 2109

for this kind of task i would consider a more mature package eg.: beautiful soup:

from bs4 import BeautifulSoup    
BeautifulSoup('<a href="SOME URL">SOME URL</a>').find("a").text
    u'SOME URL'

Upvotes: 1

Adam Smith
Adam Smith

Reputation: 54223

Sure. re.sub accepts a callable for its repl argument. The docs make it pretty clear but here's an example:

import re

re.sub(r'<a href=\".*?\">(.*?)</a>',
       lambda match: match.group(1).lower().replace(' ','_'),
       text)

Upvotes: 2

Related Questions