Maverick
Maverick

Reputation: 799

Get href within a table

Sorry, has most likely been asked before but I can't seem to find an answer on stack/from search engine.

I'm trying to scrape some data from a table, but there are href links which I need to get. Html as follows:

<table class="featprop results">
<tr>
**1)**<td class="propname" colspan="2"><a href="/lettings-search-results?task=View&amp;itemid=136" rel="nofollow"> West Drayton</a></td>
</tr>
<tr><td class="propimg" colspan="2">

    <div class="imgcrop">
    **2)**<a href="/lettings-search-results?task=View&amp;itemid=136" rel="nofollow"><img src="content/images/1/1/641/w296/858.jpg" alt=" Ashford" width="148"/></a>


    <div class="let">&nbsp;</div>
    </div>
</td></tr>

<tr><td class="proprooms">

So far I have used the following:

for table in soup.findAll('table', {'class': 'featprop results'}):
    for tr in table.findAll('tr'):
        for a in tr.findAll('a'):
            print(a)

Which returns both 1 and 2 in the above html, could anyone help me strip out just the href link?

Upvotes: 0

Views: 2950

Answers (2)

宏杰李
宏杰李

Reputation: 12168

for table in soup.findAll('table', {'class': 'featprop results'}):
    for tr in table.findAll('tr'):
        for a in tr.findAll('a'):
            print(a['href'])

out:

/lettings-search-results?task=View&itemid=136
/lettings-search-results?task=View&itemid=136

Attributes

EDIT:

links = set() # set will remove the dupilcate
for a in tr.findAll('a', href=re.compile(r'^/lettings-search-results?')):
    links.add(a['href'])

regular expression

Upvotes: 2

Cesar Ho
Cesar Ho

Reputation: 183

This provide you an array of tags under the element of the selected class name.

result = soup.select(".featprop a");
for a in result:
    print(a['href']) 

Give you the below result:

/lettings-search-results?task=View&itemid=136
/lettings-search-results?task=View&itemid=136

Upvotes: 1

Related Questions