Harry
Harry

Reputation: 4835

Python / ElementTree: following-sibling error (working in xpath tester)

I have a simple XML document (actually ENML for Evernote) as follows:

<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note>
   <div>Here is the Evernote logo:</div>
   <div>
      <en-media type="image/png" hash="a54fe8bcd146e20a8a5742834558543c" />
   </div>
   <div>
      <br />
   </div>
   <div>
      <en-todo />
      Task 1
   </div>
   <div>making it a bit harder</div>
   <div>
      <en-todo />
      Task 2 | 2016-12-31
   </div>
   <div>
      <br />
   </div>
   <div>
      This is another to-do
      <en-todo />
      in an awkward place
   </div>
</en-note>

I'm trying to use Xpath to access the text immediately after the en-todo tags. My code is:

parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('en-note//en-todo/following-sibling::text()[1]'):
    print todo.text

I've tested this using the Xpath tester at freeformatter.com - it seems to work, but only when I remove the <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"> tag from the XML - I assume this is a quirk of the tester. The output is:

Text='Task 1'
Text='Task 2 | 2016-12-31'
Text='in an awkward place'

This is exactly as anticipated and desired.

When I attempt to run the code in Python, I get: SyntaxError: prefix 'following-sibling' not found in prefix map.

I suspected this may have been the same quirk as the tester and removed the file type tag, but the same error persists.

I'm using the standard parser:

import defusedxml.lxml as lxml
from lxml import etree as ElementTree

Where am I going wrong - is my xpath statement flawed, or is there some other reason for this that I'm missing?

EDIT: @Tomalek has provided a solution that works, using the Python tail function instead of the full xpath. Given the comments from @alecxe that the docs referenced are not for lxml, I will leave this open incase anyone wants to venture an idea about why the original problem exists when there should be a full xpath implementation.

Upvotes: 1

Views: 1252

Answers (2)

Tomalak
Tomalak

Reputation: 338118

Note: this answer is targeted at xml.etree.ElementTree. The similar, but more advanced lxml.etree module has full XPath support, but the method shown below works there as well.


Straight from the documentation, emphasis mine:

19.7.2. XPath support

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

You can work around it by doing part of the traversal in Python.

In this case it's particularly easy because there's a convenient tail property you can use. Other cases require more work.

parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('.//en-todo'):
    print todo.tail

You will have to .strip() whitespace from the returned value.

Upvotes: 1

alecxe
alecxe

Reputation: 473763

You should have used the xpath() method:

for todo in root.xpath('//en-note//en-todo/following-sibling::text()[1]'):
    print todo

Also note - I've added the // at the beginning and removed the .text - you've already got the text nodes - they don't have a .text attribute.

Upvotes: 3

Related Questions