Sid Kwakkel
Sid Kwakkel

Reputation: 799

BeautifulSoup scraping: I'm confused

I'm trying to scrape this site, and I want to check all of the anchor tags.

I have imported beautifulsoup 4.3.2 and here is my code:

url = """http://www.civicinfo.bc.ca/bids?pn=1"""
Html = urlopen(url).read()
Soup = BeautifulSoup(Html, 'html.parser')
Content = Soup.find_all('a')

My problem is that Content is always empty (i.e. Content = []). Does anyone have any ideas?

Upvotes: 0

Views: 87

Answers (1)

mechanical_meat
mechanical_meat

Reputation: 169264

From the documentation html.parser is not very lenient before certain versions of Python. So you're likely looking at some malformed HTML.

What you want to do works if you use lxml instead of html.parser

From the documentation:

That said, there are things you can do to speed up Beautiful Soup. If you’re not using lxml as the underlying parser, my advice is to start. Beautiful Soup parses documents significantly faster using lxml than using html.parser or html5lib.

So the relevant code would be:

Soup = BeautifulSoup(Html, 'lxml')

Upvotes: 2

Related Questions