Adam
Adam

Reputation: 2552

Get all children of specific node in Python

I have the following example.xml structure:

<ParentOne>
   <SiblingOneA>This is Sibling One A</SiblingOneA>
   <SiblingTwoA>
      <ChildOneA>Value of child one A</ChildOneA>
      <ChildTwoA>Value of child two A</ChildTwoA>
   </SiblingTwoA>
</ParentOne>

<ParentTwo>
   <SiblingOneA>This is a different value for Sibling one A</SiblingOneA>
   <SiblingTwoA>
      <ChildOneA>This is a different value for Child one A</ChildOneA>
      <ChildTwoA>This is a different value for Child Two A</ChildTwoA>
   </SiblingTwoA>
</ParentTwo>

 <ParentThree>
   <SiblingOneA>A final value for Sibling one A</SiblingOneA>
   <SiblingTwoA>
      <ChildOneA>A final value for Child one A</ChildOneA>
      <ChildTwoA>A final value for Child one A</ChildTwoA>
   </SiblingTwoA>
</ParentThree>

My main requirement is to loop through each one of the nodes and when the current node in question is "SiblingOneA", the code makes a check to see if the sibling node directly adjacent is "SiblingTwoA". If so, then it should retrieve all the children nodes (both the elements themselves, and the values within the elements).

So far, this is my code:

from lxml import etree
XMLDoc = etree.parse('example.xml')
rootXMLElement = XMLDoc.getroot()
tree = etree.parse('example.xml)
import os

for Node in XMLDoc.xpath('//*'):
   if os.path.basename(XMLDoc.getpath(Node)) == "SiblingOneA":
      if Node.getnext() is not None:
         if Node.getnext().tag == "SiblingTwoA":
            #RETRIEVE ALL THE CHILDREN ELEMENTS OF THAT SPECIFIC SiblingTwoA NODE AND THEIR VALUES

As you may have deduced from my above code, I do not know what to put in place of the comment to retrieve all the children elements and values of the "SiblingTwoA" node. Also, this code should not return all the children elements of the SiblingTwoA nodes in the whole tree structure, but just of the one in question (i.e. the one returned from the Node.getnext() element). You will also have noticed that many of the elements are the same, but their values are different.

EDIT:

I have been able to retrieve the children of the element in question using Node.getnext().getchildren(). However, this returns the information in the form of a list, such as:

[<Element ChildOneA at 0x101a95870>, <Element ChildTwoA at 0x101a958c0>]
[<Element ChildOneA at 0x101a95a50>, <Element ChildTwoA at 0x101a95aa0>]
[<Element ChildOneA at 0x101a95c30>, <Element ChildTwoA at 0x101a95c80>]

How can I retrieve the actual values within the elements?

My desired output, for the first iteration for example, would be something like:

ChildOneA = Value of child one A

ChildTwoA = Value of child two A

Upvotes: 0

Views: 2052

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167571

I think to generate a simple list (['Value of child one A', 'Value of child two A', 'This is a different value for Child one A', 'This is a different value for Child Two A', 'A final value for Child one A', 'A final value for Child one A']) you can use

[child.xpath('string()') for sibling in doc.xpath('//SiblingTwoA[preceding-sibling::*[1][self::SiblingOneA]]') for child in sibling.xpath('*')]

to generate a nested list ([['Value of child one A', 'Value of child two A'], ['This is a different value for Child one A', 'This is a different value for Child Two A'], ['A final value for Child one A', 'A final value for Child one A']]) you can use

[[child.xpath('string()') for child in sibling.xpath('*')] for sibling in doc.xpath('//SiblingTwoA[preceding-sibling::*[1][self::SiblingOneA]]')]

Upvotes: 2

Related Questions