Reputation: 2552
I have the following "example.xml" file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>
<tag2>tag2<!-- comment = “this is the tag1 comment”--></tag2>
<tag3>
<tag4>tag4<!-- comment = “this is the tag4 comment”--></tag4>
</tag3>
</tag1>
</root>
I'd like to retrieve the comment to a specific node. For now, I'm only able to retrieve all comments from the file, using the following
from lxml import etree
tree = etree.parse("example.xml")
comments = tree.xpath('//comment()')
print(comments)
As expected, this returns all the above comments from the file in a list:
[<!-- comment = \u201cthis is the tag1 comment\u201d-->, <!-- comment = \u201cthis is the tag4 comment\u201d-->]
However, how and where do I explicitly specify the node to which I want to retrieve its comment? For example, how can I specify somewhere tag2
to only return <!-- comment = \u201cthis is the tag4 comment\u201d-->
EDIT
I have a use case where I need to iterate over each node of the XML file. If the iterator comes to a node that has more than one child with a comment, it returns all the comments of its children. For example, consider the following "example2.xml" file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>
<tag2>
<tag3>tag3<!-- comment = “this is the tag3 comment”--></tag3>
<tag4>tag4<!-- comment = “this is the tag4 comment”--></tag4>
</tag2>
</tag1>
<tag1>
<tag2>
<tag3>tag3<!-- comment = “this is the tag3 comment”--></tag3>
<tag4>tag4<!-- comment = “this is the tag4 comment”--></tag4>
</tag2>
</tag1>
</root>
If I follow the same steps as above, when the loop iterates at tag1/tag2
, it returns all of the comments for tag3 and tag4.
I.e.:
from lxml import etree
tree = etree.parse("example2.xml")
comments = tree.xpath('tag1[1]/tag2//comment()')
print(comments)
returns
[<!-- comment = \u201cthis is the tag3 comment\u201d-->, <!-- comment = \u201cthis is the tag4 comment\u201d-->]
My two questions are therefore:
Upvotes: 1
Views: 1005
Reputation: 18116
You need to specify the node:
tree = etree.parse("example.xml")
comments = tree.xpath('//tag2/comment()')
print(comments)
Output:
[<!-- comment = “this is the tag1 comment”-->]
Edit:
For your nested structure, you need to iterate over the repeating tags:
tag2Elements = tree.xpath('//tag1/tag2')
for t2 in tag2Elements:
t3Comment = t2.xpath('tag3/comment()')
print(t2, t3Comment)
Output:
<Element tag2 at 0x1066b69b0> [<!-- comment = “this is the tag3 comment”-->]
<Element tag2 at 0x1066b6960> [<!-- comment = “this is the tag3 comment”-->]
Upvotes: 1
Reputation: 311750
You can get the first comment like this:
>>> from lxml import etree
>>> with open('data.xml') as fd:
... doc = etree.parse(fd)
...
>>> doc.xpath('/root/tag1/tag2/comment()')
[<!-- comment = “this is the tag1 comment”-->]
And for the last comment:
>>> doc.xpath('/root/tag1/tag3/tag4/comment()')
[<!-- comment = “this is the tag4 comment”-->]
...and of course you can use //tag2
or //tag4
if those elements are unique and you don't want to use the full path.
Upvotes: 1
Reputation: 435
Change your xPath expression to //tag2/comment()
.
By only specifying //
you're allowing comments for any tag.
Upvotes: 1