Reputation: 2509

Parse XHTML with Python 3.2

I'm trying to parse a malformed XHTML page in Python. I just want to get a few tags of the same type from it, but it seems impossible. Normal XHTML parsers doesn't like the malformedness, and BeautifulSoup won't work because of syntax errors in its code. What would be the best way to parse malformed XHTML and get the content of a couple of tags of the same type?

Upvotes: 0

Answers (3)

user1049697

Reputation: 2509

Thanks for the help! "Unfortunately" I solved it myself by using this parser and setting html.parser.HTMLParser(strict=False). That made it read malformed XHTML quite well.

Upvotes: 0

Lennart Regebro

Reputation: 172309

"Normal" parsers? lxml usually deals fine with malformed html, although it's quite "normal". :-)

Upvotes: 2

ukessi

Reputation: 1391

You can try pyquery

I'm not sure how much malformed your XHTML is, but it's worth a try.

Upvotes: 0

Parse XHTML with Python 3.2

Answers (3)

Related Questions