Reputation: 167
I'm not sure if I'm approaching this correctly. I'm using requests to make a GET:
con = s.get(url)
when I call con.content, the whole page is there. But when I pass con into BS:
soup = BeautifulSoup(con.content)
print(soup.a)
I get none. There are lots of tags in there, not behind any JS, that are preset when i call con.content, but when I try to parse with BS most of the page is not there.
Upvotes: 1
Views: 5292
Reputation: 1832
Change the parser to html5lib
pip install html5lib
And then,
soup = BeautifulSoup(con.content,'html5lib')
Upvotes: 2
Reputation: 281
Without being able to see you're html you're getting I just did this on the hacker news site and it returns all the a tags as expected.
import requests
from bs4 import BeautifulSoup
s = requests.session()
con = s.get('https://news.ycombinator.com/')
soup = BeautifulSoup(con.text)
links = soup.findAll('a')
for link in links:
print link
Upvotes: 0
Reputation: 77454
The a
tags are probably not on the top level.
soup.find_all('a')
is probably what you wanted.
In general, I found lxml
to be more reliable, consistent in the API and faster. Yes, even more reliable - I have repeatedly had documents where BeautifulSoup failed to parse them, but lxml in its robust mode lxml.html.soupparser
still worked well. And there is the lxml.etree
API which is really easy to use.
Upvotes: 1