Zanoni
Zanoni

Reputation: 30958

Is there a good html parser like HtmlAgilityPack (.NET) for Python?

I'm looking for a good html parser like HtmlAgilityPack (open-source .NET project: http://www.codeplex.com/htmlagilitypack), but for using with Python.

Anyone knows?

Upvotes: 2

Views: 1028

Answers (3)

aehlke
aehlke

Reputation: 15831

Others have recommended BeautifulSoup, but it's much better to use lxml. Despite its name, it is also for parsing and scraping HTML. It's much, much faster than BeautifulSoup, and it even handles "broken" HTML better than BeautifulSoup (their claim to fame). It has a compatibility API for BeautifulSoup too if you don't want to learn the lxml API.

Ian Blicking agrees.

There's no reason to use BeautifulSoup anymore, unless you're on Google App Engine or something where anything not purely Python isn't allowed.

Upvotes: 8

dmeister
dmeister

Reputation: 35614

Beautiful Soup should be something you search for. It is a html/xml parser that can deal with invalid pages and allows e.g. to iterate over specific tags.

Upvotes: 0

Geo
Geo

Reputation: 96807

Use Beautiful Soup like everyone does.

Upvotes: 8

Related Questions