Reputation: 2552
I've got an XML file, 'example.xml' of a similar format to the following:
<ParentOne> <!-- Comment 1-->
<SiblingOneA>This is Sibling One A</SiblingOneA>
<SiblingTwoA> <!-- Comment -->
<ChildOneA>Value of child one A <!-- Comment 2--></ChildOneA>
<!-- <break>Comment between tags</break>-->
<ChildTwoA>Value of child two A</ChildTwoA>
<ChildThreeA>Value of child three A</ChildThreeA>
<!-- <break>Another comment between tags</break>-->
<ChildFourA>Value of child four A</ChildFourA>
</SiblingTwoA>
</ParentOne>
As you can see, there are some comments corresponding to certain tags and other comments that are in between tags. I'm trying to write something that would retrieve the comments that are in between tags, and its location.
For example, I would like to find out a way in which I am told that the first "break" comment is in between the ChildOneA and ChildTwoA tags. This is my code:
from lxml import etree
doc = etree.parse('example.xml')
root = doc.getroot()
for tag in doc.xpath('//*'):
comment = tag.xpath('{0}/comment()'.format(doc.getpath(tag)), namespaces=root.nsmap)
print(comment)
# Do some other stuff
This code returns:
[<!-- Comment 1-->]
[]
[<!-- Comment -->, <!-- <break>Comment between tags</break>-->, <!-- <break>Another comment between tags</break>-->]
[<!-- Comment 2-->]
[]
[]
[]
I understand why the 3rd element in the list, which corresponds to SiblingTwoA, returns 3 comments, as the 2 break comments technically correspond to that tag. However, is there a way where I can find out that the first of those break comments is between the ChildOneA and ChildTwoA tags, and the second is between ChildThreeA and ChildFourA tags?
Happy to clarify if required as this may be a bit confusing to understand.
Upvotes: 1
Views: 168
Reputation: 24928
I believe you are looking for something like this:
for tag in doc.xpath('//*'):
comment = tag.xpath('./comment()')
if comment:
for c in comment:
bef = c.xpath('./preceding-sibling::*[1]')
aft = c.xpath('./following-sibling::*[1]')
if bef:
print(c,'is between',bef[0].tag,'and ',aft[0].tag)
Output:
<!-- <break>Comment between tags</break>--> is between ChildOneA and ChildTwoA
<!-- <break>Another comment between tags</break>--> is between ChildThreeA and ChildFourA
Upvotes: 1