castiel
castiel

Reputation: 2775

How to parse HTML or XHTML or XML with python in a efficient way?

My python env is 2.7

I know this is an old question, but I've lost my mind while I was searching and reading other people's questions and answers. Some of them is really out of date. Like the code below:

import lxml #wrong
import xml #correct

So, since I'm a newbie to python and know nothing whatsoever in the great python history, I wanna make things more clear to me. Such as, what is the so-called standard xml-parser module in python now? what can I do when I need parse some HTML by using the xpath syntax. If I have a mal-formed HTML source code, how can handle it by not using BeautifulSoup or something else like. If u can brief me with something, I'll be much appreciated.

OK, all in all, I just got one question. How can I parse mal-formed html code by using standard python module with python2.7?

Upvotes: 1

Views: 5599

Answers (1)

Francis Avila
Francis Avila

Reputation: 31621

Read the python library documentation if you need to stick to the standard library.

If you don't, definitely look at lxml, which does much more.

Upvotes: 3

Related Questions