mushfiq
mushfiq

Reputation: 1600

Regular Expression for checking a tag and getting a specific portion of the strings

I am facing a problem with regular expression. I am checking strings like tag: <a href="/abc/def/ghk/">test_test</a>. I want to capture only the /abc/def/ghk portion using a regular expression.

I am using python and have tried with different expressions.

Upvotes: 1

Views: 159

Answers (3)

Blender
Blender

Reputation: 298166

I'd use BeautifulSoup, as it's made for doing things like this:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<a href="/abc/def/ghk/">test_test</a>')
>>> print soup.findAll('a', {'href': True})[0]['href']
/abc/def/ghk/

Upvotes: 4

jfs
jfs

Reputation: 414189

You could use lxml to work with links:

from lxml import html

for _, attr, link, _ in html.iterlinks('<a href="/abc/def/ghk/">test_test</a>'):
    if attr == 'href':
       print link

Output

/abc/def/ghk/

Upvotes: 1

Mike Pennington
Mike Pennington

Reputation: 43077

Is this sufficient?

>>> re.search('<a\s+href="(\S+?)\/"', tags).group(1)
'/abc/def/ghk'
>>>

Upvotes: 1

Related Questions