Reputation: 1340
I'm using lxml library and Python 2.7 to parse xml files. I need to detect if sibling elements which don't have any text between them. For example in following xml portion:
<cross-ref> [t1] </cross-ref> ***some text*** <cross-ref> [t2] </cross-ref>
<cross-ref> [t3] </cross-ref><cross-ref> [t4] </cross-ref>
saying that I detect all elements with cross-ref tags, I need a way which only detects cross-ref elements on second line which second element comes after the first one with no text between them. So I guess something as folloing for loop is needed but obviously this code prints both [t1]
and [t3]
for c in cross_refs:
# detect ***some text*** or do something else here
if c.getnext().tag == "cross-ref":
print c.text
I need to modify it so the output will be only [t3]
.
Upvotes: 1
Views: 491
Reputation: 89325
The triple nested if
s in your answer can also be expressed in XPath as follow :
following-sibling::node()[1][self::cross-ref]
In short, the XPath returns the nearest following sibling node only if it is cross-ref
element. Notice that node here means either text node or element node. The XPath can be used as follow :
for c in cross_refs:
if c.xpath('boolean(following-sibling::node()[1][self::cross-ref])'):
print c.text
Or you can get only cross-ref
elements that match this criteria in the first place, if you like :
cross_refs = tree.xpath('//cross-ref[following-sibling::node()[1][self::cross-ref]]')
for c in cross_refs:
print c.text
Upvotes: 2
Reputation: 1340
I solved the problem using tail property. When c.tail is None, then I can say that to elements are attached with no text between them. The code is like this:
for c in cross_refs:
if c.getnext() != None:
if c.getnext().tag == c.tag:
if c.tail == None:
print c.text
Upvotes: 0