Reputation: 1092
I try to extract a value with XPath, Python and etree. I have no influence on the .xml file I receive and I think it seems to be somehow invalid.
My method already extracts the text node object I want to examine.
# This is the tag.
textTag = lastExportTree.xpath("//TEXT_NODE[@PROPERTY = '%s']/TEXT[@ID = '%s']" % (key, id[1]))
# This is a part of the xml. I already have the text node I want to examine.
<TEXT ID="1001" STATE="5" LOCKED="false"><SYSTEMMESSAGE>CALBUY</SYSTEMMESSAGE>Hiho</TEXT>
<TEXT ID="1002" STATE="1" LOCKED="false"/>
<TEXT ID="1003" STATE="5" LOCKED="false">Stack</TEXT>
<TEXT ID="1004" STATE="1" LOCKED="false">Overflow</TEXT>
If I want to access the content of ID="1003" I only have to type:
print(textTag.text); # Will print 'Stack'
But the tag with ID="1001" also includes the SYSTEMMESSAGE Tag. How can I access the content 'HiHo'? (textTag.text won't work!) Is this invalid xml what I receive?
Thank you a lot for your answer!
Upvotes: 0
Views: 2434
Reputation: 3493
I've encountered this problem before as well, and this is what we ended up with. In our case we were interested in finding the text in all the non-script and non-style children of an element.
# Just to pre-compile our XPath. This will get all the text from this element from
# each of the child elements that aren't 'script' or 'style'
textXpath = etree.XPath(
'(.|.//*[not(name()="script")][not(name()="style")])/text()')
# If instead you don't want to include the current element:
# textXpath = etree.XPath(
# './/*[not(name()="script")][not(name()="style")]/text()')
results = ''.join(textXpath(textTag))
It might not be the prettiest chunk of code, but it's what we've resorted to.
Upvotes: 1
Reputation: 77347
Assuming you are showing us the nodes under lastExportTree, this should do it:
lastExportTree.xpath('TEXT[@STATE="5" and @LOCKED="false" and SYSTEMMESSAGE]/text()')[0]
That says to find all child nodes named TEXT that have the given STATE and LOCKED attributes and a SYSTEMMESSAGE child element.
Upvotes: 0