Reputation: 1600
I am facing a problem with regular expression. I am checking strings like tag:
<a href="/abc/def/ghk/">test_test</a>
. I want to capture only the /abc/def/ghk
portion using a regular expression.
I am using python and have tried with different expressions.
Upvotes: 1
Views: 159
Reputation: 298166
I'd use BeautifulSoup, as it's made for doing things like this:
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<a href="/abc/def/ghk/">test_test</a>')
>>> print soup.findAll('a', {'href': True})[0]['href']
/abc/def/ghk/
Upvotes: 4
Reputation: 414189
You could use lxml
to work with links:
from lxml import html
for _, attr, link, _ in html.iterlinks('<a href="/abc/def/ghk/">test_test</a>'):
if attr == 'href':
print link
/abc/def/ghk/
Upvotes: 1
Reputation: 43077
Is this sufficient?
>>> re.search('<a\s+href="(\S+?)\/"', tags).group(1)
'/abc/def/ghk'
>>>
Upvotes: 1