Reputation: 1
I'm trying to parse an HTML source with Python using BeautifulSoup. What I need to get is to get the href
of specific links (<a>
tags). The feature I see is that those links all include target='testwindow'
inside their tags, so maybe I look for that. How can I get those links?
This is my test sample. I would need to get only http://example.com:20213/testweb1.2/testapp?WSDL
.
<td id="link3"><img src="images/spacer.gif" alt="" style="height:1px;" width="0" border="0"><a href="http://example.com:20213/testweb1.2/testapp?WSDL">?HELLO</a></td>
<td id="link4"><img src="images/spacer.gif" alt="" style="height:1px;" width="0" border="0"><a href="http://example.com:20213/testweb1.2/testapp?WSDL" target="testwindow">?WSDL</a></td>
Upvotes: 0
Views: 2481
Reputation: 71451
You can use BeautifulSoup.find
:
from bs4 import BeautifulSoup as soup
content = '<td id="link4"><img src="images/spacer.gif" alt="" style="height:1px;" width="0" border="0"><a href="http://example.com:20213/testweb1.2/testapp?WSDL" target="testwindow">?WSDL</a></td>'
d = soup(content, 'html.parser').find('a', {'target':'testwindow'})['href']
Output:
'http://example.com:20213/testweb1.2/testapp?WSDL'
Upvotes: 2