Stevoisiak
Stevoisiak

Reputation: 26862

What are the differences between lxml and ElementTree?

When it comes to generating XML data in Python, there are two libraries I often see recommended: lxml and ElementTree

From what I can tell, the two libraries are very similar to each other. They both seem to have similar module names, usage guidelines, and functionality. Even the import statements are fairly similar.

 # Importing lxml and ElementTree
import lxml.etree
import xml.etree.ElementTree

What are the differences between the lxml and ElementTree libraries for Python?

Upvotes: 67

Views: 37388

Answers (3)

Chris E
Chris E

Reputation: 79

Having tried both libraries iterating through a 1.6GB XML file and extracting some data I got the following results:

Option A: Using standard python xml

import xml.etree.ElementTree as ET
context = ET.iterparse(file_path, events=("start", "end"))
... processing xml file and extracting data

Processing time: 242 seconds

Option B: Using 3rd party lxml

from lxml import etree
context = etree.iterparse(file_path, events=("start", "end"), recover=True, huge_tree=True)
... processing xml file and extracting data

Processing time: 345 seconds

Note that this was a particular xml file processing example and may be different for another code construct.

Upvotes: 1

Parfait
Parfait

Reputation: 107687

ElementTree comes built-in with the Python standard library which includes other data modules types such as json and csv. This means the module ships with each installation of Python. For most normal XML operations including building document trees and simple searching and parsing of element attributes and node values, even namespaces, ElementTree is a reliable handler.

Lxml is a third-party module that requires installation. In many ways lxml actually extends ElementTree as most operations in the built-in module are available. Chief among this extension is that lxml supports both XPath 1.0 and XSLT 1.0. Additionally, lxml can parse HTML documents that are not XML compliant and hence is used for web-scraping operations and even as the parser in BeautifulSoup and engine in Pandas, pandas.read_html(). Other useful, common features of lxml include pretty_print output, objectify, and sax support. Of course too as a third-party module, versions with additional features are readily accessible compared to the standard library.

Upvotes: 60

user9387863
user9387863

Reputation:

I wouldn't say that lxml is faster than ET across the board as both modules offer tons of functionality. To provide a little context, ElementTree also supports XPath, but particularly ET has a unique and useful function called iterparse() that remakes the XML document as an iterable. This results in much faster parsing, especially for large XML files.

The ET API itself creates Element types which are a hybrid cross between a list and dictionary. This can mean headaches for those new to the module, but sit down with it and you'll see that it's pretty flexible.

Upvotes: 3

Related Questions