Himanshu Ladia
Himanshu Ladia

Reputation: 9

Getting href using beautiful soup with different methods

I'm trying to scrape a website. I learned to scrape from two resources: one used tag.get('href') to get the href from an a tag, and one used tag['href'] to get the same. As far as I understand it, they both do the same thing. But when I tried this code:

link_list = [l.get('href') for l in soup.find_all('a')]

it worked with the .get method, but not with the dictionary access way.

link_list = [l['href'] for l in soup.find_all('a')]

This throws a KeyError. I'm very new to scraping, so please pardon if this is a silly one.

Edit - Both of the methods worked for the find method instead of find_all.

Upvotes: 1

Views: 8791

Answers (2)

alecxe
alecxe

Reputation: 473763

You may let BeautifulSoup find the links with existing href attributes only. test

You can do it in two common ways, via find_all():

link_list = [a['href'] for a in soup.find_all('a', href=True)]

Or, with a CSS selector:

link_list = [a['href'] for a in soup.select('a[href]')]

Upvotes: 5

Serhii
Serhii

Reputation: 1587

Maybe HTML-string does not have a "href"? For example:

from bs4 import BeautifulSoup


doc_html = """<a class="vote-up-off" title="This question shows research effort; it is useful and clear">up vote</a>"""
soup = BeautifulSoup(doc_html, 'html.parser')
ahref = soup.find('a')
ahref.get('href')

Nothing will happen, but

ahref['href']

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sergey/.virtualenvs/soup_example/lib/python3.5/site-
packages/bs4/element.py", line 1011, in __getitem__
return self.attrs[key]
KeyError: 'href'
'href'

Upvotes: 0

Related Questions