Reputation: 723
So I replace the the link with the text of the link
text = re.sub('<a href=\".*?\">(.*?)</a>','\\1',text)
example:
>>>text="<a href="SOME URL">SOME URL</a>"
>>>text = re.sub('<a href=\".*?\">(.*?)</a>','\\1',text)
>>>print text
SOME URL
I would like it to output some_url
but adding .lower().replace(' ','_') doesn't help
>>>text = re.sub('<a href=\".*?\">(.*?)</a>','\\1'.lower().replace(' ','_'),text)
SOME URL
Upvotes: 0
Views: 43
Reputation: 2109
for this kind of task i would consider a more mature package eg.: beautiful soup:
from bs4 import BeautifulSoup
BeautifulSoup('<a href="SOME URL">SOME URL</a>').find("a").text
u'SOME URL'
Upvotes: 1
Reputation: 54223
Sure. re.sub
accepts a callable for its repl
argument. The docs make it pretty clear but here's an example:
import re
re.sub(r'<a href=\".*?\">(.*?)</a>',
lambda match: match.group(1).lower().replace(' ','_'),
text)
Upvotes: 2