Andre Otte
Andre Otte

Reputation: 15

BeautifulSoup .get not returning 'href'

I am working on creating a web-scraping tool that will download articles to txt files. I have created the soup with bs4 and pulled out the specific piece of html that contains the desired url for the article I want to download:

>>>prevLink = soup2.select('.previous_post')
>>>prevLink
[<span class="previous_post">Previous Post: <a href="http://www.mrmoneymustache.com/2018/11/08/honey-badger-entrepreneur/" rel="prev">An Interview With The Man Who Never Needed a Real Job</a></span>]

So far so good (I think). Then I try to use .get('href') to pull out the link, but it returns 'none'.

>>>print(prevLink[0].get('href'))
None

When I use .get('class') to select for the class, however, it seems to work.

>>> print(prevLink[0].get('class'))
['previous_post']

I don't understand why .get('class') is acting differently than .get('href'). Thanks for looking.

Upvotes: 1

Views: 161

Answers (1)

alecxe
alecxe

Reputation: 474221

prevLink is not actually referencing a link, but span element.

Just get deeper to the a element with your selector:

prevLink = soup2.select_one('.previous_post > a')
print(prevLink.get('href'))

Upvotes: 1

Related Questions