angel_30
angel_30

Reputation: 1

Get specific links with target in Python BeautifulSoup

I'm trying to parse an HTML source with Python using BeautifulSoup. What I need to get is to get the href of specific links (<a> tags). The feature I see is that those links all include target='testwindow' inside their tags, so maybe I look for that. How can I get those links?

This is my test sample. I would need to get only http://example.com:20213/testweb1.2/testapp?WSDL.

<td id="link3"><img src="images/spacer.gif" alt="" style="height:1px;" width="0" border="0"><a href="http://example.com:20213/testweb1.2/testapp?WSDL">?HELLO</a></td>
<td id="link4"><img src="images/spacer.gif" alt="" style="height:1px;" width="0" border="0"><a href="http://example.com:20213/testweb1.2/testapp?WSDL" target="testwindow">?WSDL</a></td>

Upvotes: 0

Views: 2481

Answers (1)

Ajax1234
Ajax1234

Reputation: 71451

You can use BeautifulSoup.find:

from bs4 import BeautifulSoup as soup
content = '<td id="link4"><img src="images/spacer.gif" alt="" style="height:1px;" width="0" border="0"><a href="http://example.com:20213/testweb1.2/testapp?WSDL" target="testwindow">?WSDL</a></td>'
d = soup(content, 'html.parser').find('a', {'target':'testwindow'})['href']

Output:

'http://example.com:20213/testweb1.2/testapp?WSDL'

Upvotes: 2

Related Questions