Rho Phi
Rho Phi

Reputation: 1240

search in the values of my XML with Python

I have XML like this and I would like to obtain the value of the line with tag=035 and code=a for the node where tag=035 and code=9 is "BAI" I have tried to identify the node where BAI appears with this and then ask for its parent node

[ _sub.getparent() for _sub in _xml.findall(".//*[@tag='035']/*[@code='9']") if(_sub.text=='BAI') ]

but the parent is empty ... how do I get my 035,a at the node where 035,9='BAI'?

Upvotes: 0

Views: 64

Answers (1)

XMLSchemer
XMLSchemer

Reputation: 146

You can do it all in pure XPath like this:

//*[@tag='035']/*[@code='9'][. = 'BAI']/following-sibling::*[@code='a']

That formulation presumes that whatever validates and/or issues out your data will enforce any [@code='a']s as following [@code='9']s.

You can also, and perhaps ideally, write the xpath like this:

//*[@tag='035']/*[@code='9'][. = 'BAI']/../*[@code='a']

Or like this:

//*[@tag='035'][subfield[@code='9' and . = 'BAI']]/subfield[@code='a']

Or more generally:

//*[@tag='035'][child::*[@code='9' and . = 'BAI']]/child::*[@code='a']

That formulation presumes nothing in terms of order.

XPath is a very powerful language, and XPath 3.0, in particular, is a full turing-complete language, which makes it even more powerful and awesome.

As far as lxml is concerned, it won't take all of those formulations. But luckily, the shortest and sweetest is accepted, so:

from lxml import etree


tree = etree.parse("data/search.xml")

print(tree.findall("//*[@tag='035']/*[@code='9'][. = 'BAI']/../*[@code='a']"))

Hope this helps!

Upvotes: 2

Related Questions