Reputation: 799
I'm trying to scrape this site, and I want to check all of the anchor tags.
I have imported beautifulsoup 4.3.2 and here is my code:
url = """http://www.civicinfo.bc.ca/bids?pn=1"""
Html = urlopen(url).read()
Soup = BeautifulSoup(Html, 'html.parser')
Content = Soup.find_all('a')
My problem is that Content is always empty (i.e. Content = []). Does anyone have any ideas?
Upvotes: 0
Views: 87
Reputation: 169264
From the documentation html.parser
is not very lenient before certain versions of Python. So you're likely looking at some malformed HTML.
What you want to do works if you use lxml
instead of html.parser
From the documentation:
That said, there are things you can do to speed up Beautiful Soup. If you’re not using lxml as the underlying parser, my advice is to start. Beautiful Soup parses documents significantly faster using lxml than using html.parser or html5lib.
So the relevant code would be:
Soup = BeautifulSoup(Html, 'lxml')
Upvotes: 2