Reputation: 799
Sorry, has most likely been asked before but I can't seem to find an answer on stack/from search engine.
I'm trying to scrape some data from a table, but there are href links which I need to get. Html as follows:
<table class="featprop results">
<tr>
**1)**<td class="propname" colspan="2"><a href="/lettings-search-results?task=View&itemid=136" rel="nofollow"> West Drayton</a></td>
</tr>
<tr><td class="propimg" colspan="2">
<div class="imgcrop">
**2)**<a href="/lettings-search-results?task=View&itemid=136" rel="nofollow"><img src="content/images/1/1/641/w296/858.jpg" alt=" Ashford" width="148"/></a>
<div class="let"> </div>
</div>
</td></tr>
<tr><td class="proprooms">
So far I have used the following:
for table in soup.findAll('table', {'class': 'featprop results'}):
for tr in table.findAll('tr'):
for a in tr.findAll('a'):
print(a)
Which returns both 1 and 2 in the above html, could anyone help me strip out just the href link?
Upvotes: 0
Views: 2950
Reputation: 12168
for table in soup.findAll('table', {'class': 'featprop results'}):
for tr in table.findAll('tr'):
for a in tr.findAll('a'):
print(a['href'])
out:
/lettings-search-results?task=View&itemid=136
/lettings-search-results?task=View&itemid=136
EDIT:
links = set() # set will remove the dupilcate
for a in tr.findAll('a', href=re.compile(r'^/lettings-search-results?')):
links.add(a['href'])
Upvotes: 2
Reputation: 183
This provide you an array of tags under the element of the selected class name.
result = soup.select(".featprop a");
for a in result:
print(a['href'])
Give you the below result:
/lettings-search-results?task=View&itemid=136
/lettings-search-results?task=View&itemid=136
Upvotes: 1