Adam
Adam

Reputation: 2552

Retrieve value of child node given attribute using Python

I have an XSD file of the following format:

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:type name="type1">
        <xsd:example>
          <xsd:description>This is the description of said type1 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type2">
        <xsd:example>
          <xsd:description>This is the description of said type2 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type3">
        <xsd:example>
          <xsd:description>This is the description of said type3 tag</xsd:description>
        </xsd:example>
    </xsd:type>
</xsd:schema>

and the following XML file:

<theRoot>
    <type1>hi from type1</type1>
    <theChild>
        <type2>hi from type2</type2>
        <type3>hi from type3</type3>
    </theChild>
</theRoot>

I'd like to retrieve the value in between the xsd:description tag given that it is the child of the xsd:type tag with the name="type1" attribute. In other words, I'd like to retrieve "This is the description of said type1 tag".

I have tried to do this with lxml in the following way using Python:

from lxml import etree
XSDDoc = etree.parse(xsdFile)
root = XSDDoc.getroot()
result = root.findall(".//xsd:type/xsd:example/xsd:description[@name='type1']", root.nsmap)

I've used the same example and solution mentioned here. However, what I have done just returns empty results and I'm not able to retrieve the correct result.

For reference, my Python version is: Python 2.7.10

EDIT: When I use an example provided in the answer by retrieving the XML structure from a string, the result is as expected. However, when I try to retrieve from a file, I get empty lists returned (or None).

I am doing the following:

The code loops over each node in a separate XML file, then checks in the XSD file to get each of the attributes as a result:

XMLDoc = etree.parse(open(xmlFile))

for Node in XMLDoc.xpath('//*'):
    nameVariable = os.path.basename(XMLDoc.getpath(Node))
    root = XSDDoc.getroot()
    description = XSDDoc.find(".//xsd:type[@name='{0}']/xsd:example/xsd:description".format(nameVariable), root.nsmap)

If I try to print out the result.text, I get:

AttributeError: 'NoneType' object has no attribute 'text'

Upvotes: 0

Views: 607

Answers (1)

mzjn
mzjn

Reputation: 51042

The predicate ([@name='type1']) must be applied in the right place. The name attribute is on the xsd:type element. This should work:

result = root.findall(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

# result is a list
for r in result:
    print(r.text)

In case you only want a single node, you can use find instead of findall. Complete example:

from lxml import etree

xsdFile = """
<root xmlns:xsd='http://whatever.com'>
 <xsd:type name="type1">
     <xsd:example>
       <xsd:description>This is the description of said type1 tag</xsd:description>
     </xsd:example>
 </xsd:type>
</root>"""

root = etree.fromstring(xsdFile)
result = root.find(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

print(result.text)

Upvotes: 1

Related Questions