BeautifulSoup scraping: I'm confused

Question

I'm trying to scrape this site, and I want to check all of the anchor tags.

I have imported beautifulsoup 4.3.2 and here is my code:

url = """http://www.civicinfo.bc.ca/bids?pn=1"""
Html = urlopen(url).read()
Soup = BeautifulSoup(Html, 'html.parser')
Content = Soup.find_all('a')

My problem is that Content is always empty (i.e. Content = []). Does anyone have any ideas?

mechanical_meat · Accepted Answer

From the documentation html.parser is not very lenient before certain versions of Python. So you're likely looking at some malformed HTML.

What you want to do works if you use lxml instead of html.parser

From the documentation:

That said, there are things you can do to speed up Beautiful Soup. If you’re not using lxml as the underlying parser, my advice is to start. Beautiful Soup parses documents significantly faster using lxml than using html.parser or html5lib.

So the relevant code would be:

Soup = BeautifulSoup(Html, 'lxml')

BeautifulSoup scraping: I'm confused

Answers (1)

Related Questions

BeautifulSoup scraping: I&#39;m confused

Answers (1)

Related Questions

BeautifulSoup scraping: I'm confused