Reputation: 573
I am using beautifulsoup to get all the links from a page. My code is:
import requests
from bs4 import BeautifulSoup
url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo'
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')
soup.find_all('href')
All that I get is:
[]
How can I get a list of all the href links on that page?
Upvotes: 17
Views: 29533
Reputation: 2317
To get a list of everyhref
regardless of tag use:
href_tags = soup.find_all(href=True)
hrefs = [tag.get('href') for tag in href_tags]
Upvotes: 2
Reputation: 199
Replace your last line:
links = soup.find_all('a')
By that line :
links = [a.get('href') for a in soup.find_all('a', href=True)]
It will scrap all the a
tags, and for each a
tags, it will append the href
attribute to the links list.
If you want to know more about the for loop between the []
, read about List comprehensions.
Upvotes: 15
Reputation: 2540
You are telling the find_all
method to find href
tags, not attributes.
You need to find the <a>
tags, they're used to represent link elements.
links = soup.find_all('a')
Later you can access their href
attributes like this:
link = links[0] # get the first link in the entire page
url = link['href'] # get value of the href attribute
url = link.get('href') # or like this
Upvotes: 23