What is an ElementTree object exactly, and how can I get data from it?

Question

I'm trying to teach myself how to parse XML. I've read the lxml tutorials, but they're hard to understand. So far, I can do:

>>> from lxml import etree
>>> xml=etree.parse('ham.xml')
>>> xml

But how can I get data from this object? It can't be indexed like xml[0], and it can't be iterated over.

More specifically, I'm using this xml file and I'm trying to extract, say, everything between the tags that's surrounded by tags that contain, say, the Barnardo attribute.

Martijn Pieters · Accepted Answer

It is a ElementTree Element object.

You can also look at the lxml API documentation, which has an lxml.etree._Element page. That page tells you about every single attribute and method on that class you could ever want to know about.

I'd start with reading the lxml.etree tutorial, however.

If the element cannot be indexed, however, it is an empty tag, and there are no child nodes to retrieve.

To find all lines by Bernardo, an XPath expression is needed, with a namespace map. It doesn't matter what prefix you use, as long as it is a non-empty string lxml will map it to the correct namespace URL:

nsmap = {'s': 'http://www.tei-c.org/ns/1.0'}

for line in tree.xpath('.//s:sp[@who="Barnardo"]/s:l/text()', namespaces=nsmap):
    print line.strip()

This extracts all text in elements that are contained in tags. Note the s: prefixes on the tag names, the nsmap dictionary tells lxml what namespace to use. I printed these without the surrounding extra whitespace.

For your sample document, that gives:

>>> for line in tree.xpath('.//s:sp[@who="Barnardo"]/s:l/text()', namespaces=nsmap):
...     print line.strip()
... 
Who's there?
Long live the king!
He.
'Tis now struck twelve; get thee to bed, Francisco.
Have you had quiet guard?
Well, good night.
If you do meet Horatio and Marcellus,
The rivals of my watch, bid them make haste.
Say,
What, is Horatio there?
Welcome, Horatio: welcome, good Marcellus.
I have seen nothing.
Sit down awhile;
And let us once again assail your ears,
That are so fortified against our story
What we have two nights seen.
Last night of all,
When yond same star that's westward from the pole
Had made his course to illume that part of heaven
Where now it burns, Marcellus and myself,
The bell then beating one,

In the same figure, like the king that's dead.
Looks 'a not like the king? mark it, Horatio.
It would be spoke to.
See, it stalks away!
How now, Horatio! you tremble and look pale:
Is not this something more than fantasy?
What think you on't?
I think it be no other but e'en so:
Well may it sort that this portentous figure
Comes armed through our watch; so like the king
That was and is the question of these wars.
'Tis here!
It was about to speak, when the cock crew.

What is an ElementTree object exactly, and how can I get data from it?

Answers (2)

Related Questions