user1922364
user1922364

Reputation: 573

Getting all Links from a page Beautiful Soup

I am using beautifulsoup to get all the links from a page. My code is:

import requests
from bs4 import BeautifulSoup


url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo'
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')

soup.find_all('href')

All that I get is:

[]

How can I get a list of all the href links on that page?

Upvotes: 17

Views: 29533

Answers (3)

Oliver Oliver
Oliver Oliver

Reputation: 2317

To get a list of everyhref regardless of tag use:

href_tags = soup.find_all(href=True)   
hrefs = [tag.get('href') for tag in href_tags]

Upvotes: 2

wbwlkr
wbwlkr

Reputation: 199

Replace your last line:

links = soup.find_all('a')

By that line :

links = [a.get('href') for a in soup.find_all('a', href=True)]

It will scrap all the a tags, and for each a tags, it will append the href attribute to the links list.

If you want to know more about the for loop between the [], read about List comprehensions.

Upvotes: 15

Anonta
Anonta

Reputation: 2540

You are telling the find_all method to find href tags, not attributes.

You need to find the <a> tags, they're used to represent link elements.

links = soup.find_all('a')

Later you can access their href attributes like this:

link = links[0]          # get the first link in the entire page
url  = link['href']      # get value of the href attribute
url  = link.get('href')  # or like this

Upvotes: 23

Related Questions