Reputation: 67
Unable to fetch the link from the href
tag using beautiful soup.
I have provided the html structure below. Tried various extracting logic but the code is returning blank all the time Please advise
<div class="review_list_pagination">
<p class="page_link review_next_page">
<a href="/reviews/in/hotel/best-western-star-residency.html"
id="review_next_page_link">Next page </a>
</p>
</div>
Tried
link = soup.find_all(attrs={"class": "page_link review_next_page"})
link = soup.find_all('p', attrs = {'class': 'page_link review_next_page'})
Result:
[<p class="page_link review_next_page"><a href="/reviews/in/hotel/best-western-star-residency.html?page=2&" id="review_next_page_link">Next page</a></p>,
<p class="page_link review_next_page"> <a href="/reviews/in/hotel/best western-star-residency.html?page=2&" id="review_next_page_link">Next page</a></p>]
But
print(link[0].get('href'))
Result: Blank
Expected: /reviews/in/hotel/best-western-star-residency.html?page=2&
Upvotes: 2
Views: 161
Reputation: 295
There are lots of different ways to tackle this one, I landed on the below. Hope that helps.
link = soup.find("p",{"class":"page_link review_next_page"}).a['href']
Upvotes: 0
Reputation: 24930
For the sake of future generations (:D), you can also use either of these:
soup3.select('a[id="review_next_page_link"]')[0]['href']
#or
soup3.select_one('a[id="review_next_page_link"]')['href']
#or
soup3.select('#review_next_page_link')[0]['href']
... and I'm sure there are more ways to do this. They all output:
'/reviews/in/hotel/best-western-star-residency.html'
Upvotes: 0
Reputation: 335
Try the following:
link = find('a', {"id": "review_next_page_link"})["href"]
What you are getting is a p tag from the soup. You can not get a property of the inner a tag from the p tag you are finding.
The line above will find the tag with id =review_next_page_link, and you can simply get its href value.
Upvotes: 2