Reputation: 4835
I have a simple XML document (actually ENML for Evernote) as follows:
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note>
<div>Here is the Evernote logo:</div>
<div>
<en-media type="image/png" hash="a54fe8bcd146e20a8a5742834558543c" />
</div>
<div>
<br />
</div>
<div>
<en-todo />
Task 1
</div>
<div>making it a bit harder</div>
<div>
<en-todo />
Task 2 | 2016-12-31
</div>
<div>
<br />
</div>
<div>
This is another to-do
<en-todo />
in an awkward place
</div>
</en-note>
I'm trying to use Xpath to access the text immediately after the en-todo
tags. My code is:
parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('en-note//en-todo/following-sibling::text()[1]'):
print todo.text
I've tested this using the Xpath tester at freeformatter.com - it seems to work, but only when I remove the <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
tag from the XML - I assume this is a quirk of the tester. The output is:
Text='Task 1'
Text='Task 2 | 2016-12-31'
Text='in an awkward place'
This is exactly as anticipated and desired.
When I attempt to run the code in Python, I get: SyntaxError: prefix 'following-sibling' not found in prefix map
.
I suspected this may have been the same quirk as the tester and removed the file type tag, but the same error persists.
I'm using the standard parser:
import defusedxml.lxml as lxml
from lxml import etree as ElementTree
Where am I going wrong - is my xpath statement flawed, or is there some other reason for this that I'm missing?
EDIT: @Tomalek has provided a solution that works, using the Python tail
function instead of the full xpath. Given the comments from @alecxe that the docs referenced are not for lxml, I will leave this open incase anyone wants to venture an idea about why the original problem exists when there should be a full xpath implementation.
Upvotes: 1
Views: 1252
Reputation: 338118
Note: this answer is targeted at xml.etree.ElementTree
. The similar, but more advanced lxml.etree
module has full XPath support, but the method shown below works there as well.
Straight from the documentation, emphasis mine:
19.7.2. XPath support
This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.
You can work around it by doing part of the traversal in Python.
In this case it's particularly easy because there's a convenient tail
property you can use. Other cases require more work.
parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('.//en-todo'):
print todo.tail
You will have to .strip()
whitespace from the returned value.
Upvotes: 1
Reputation: 473763
You should have used the xpath()
method:
for todo in root.xpath('//en-note//en-todo/following-sibling::text()[1]'):
print todo
Also note - I've added the //
at the beginning and removed the .text
- you've already got the text nodes - they don't have a .text
attribute.
Upvotes: 3