Adam
Adam

Reputation: 2552

Get comment location outside of element in an XML file - Python

I've got an XML file, 'example.xml' of a similar format to the following:

<ParentOne> <!-- Comment 1-->
   <SiblingOneA>This is Sibling One A</SiblingOneA> 
   <SiblingTwoA> <!-- Comment -->
      <ChildOneA>Value of child one A <!-- Comment 2--></ChildOneA>
      <!-- <break>Comment between tags</break>-->
      <ChildTwoA>Value of child two A</ChildTwoA>
      <ChildThreeA>Value of child three A</ChildThreeA>
      <!-- <break>Another comment between tags</break>-->
      <ChildFourA>Value of child four A</ChildFourA>
   </SiblingTwoA>
</ParentOne>

As you can see, there are some comments corresponding to certain tags and other comments that are in between tags. I'm trying to write something that would retrieve the comments that are in between tags, and its location.

For example, I would like to find out a way in which I am told that the first "break" comment is in between the ChildOneA and ChildTwoA tags. This is my code:

from lxml import etree

doc = etree.parse('example.xml')
root = doc.getroot()

for tag in doc.xpath('//*'):
   comment = tag.xpath('{0}/comment()'.format(doc.getpath(tag)), namespaces=root.nsmap)
   print(comment)
   # Do some other stuff

This code returns:

[<!-- Comment 1-->]
[]
[<!-- Comment -->, <!-- <break>Comment between tags</break>-->, <!-- <break>Another comment between tags</break>-->]
[<!-- Comment 2-->]
[]
[]
[]

I understand why the 3rd element in the list, which corresponds to SiblingTwoA, returns 3 comments, as the 2 break comments technically correspond to that tag. However, is there a way where I can find out that the first of those break comments is between the ChildOneA and ChildTwoA tags, and the second is between ChildThreeA and ChildFourA tags?

Happy to clarify if required as this may be a bit confusing to understand.

Upvotes: 1

Views: 168

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24928

I believe you are looking for something like this:

for tag in doc.xpath('//*'):
    comment = tag.xpath('./comment()')
    if comment:
        for c in comment: 
            bef = c.xpath('./preceding-sibling::*[1]')
            aft = c.xpath('./following-sibling::*[1]')
            if bef:
                print(c,'is between',bef[0].tag,'and ',aft[0].tag)

Output:

<!-- <break>Comment between tags</break>--> is between ChildOneA and  ChildTwoA
<!-- <break>Another comment between tags</break>--> is between ChildThreeA and  ChildFourA

Upvotes: 1

Related Questions