Detect whether is the immediate sibling of an element is text or another element in lxml

Question

I'm using lxml library and Python 2.7 to parse xml files. I need to detect if sibling elements which don't have any text between them. For example in following xml portion:

 [t1]  ***some text***  [t2]   
 [t3]  [t4]

saying that I detect all elements with cross-ref tags, I need a way which only detects cross-ref elements on second line which second element comes after the first one with no text between them. So I guess something as folloing for loop is needed but obviously this code prints both [t1] and [t3]

for c in cross_refs:
  # detect ***some text*** or do something else here
  if c.getnext().tag == "cross-ref":
     print c.text

I need to modify it so the output will be only [t3].

har07 · Accepted Answer

The triple nested ifs in your answer can also be expressed in XPath as follow :

following-sibling::node()[1][self::cross-ref]

In short, the XPath returns the nearest following sibling node only if it is cross-ref element. Notice that node here means either text node or element node. The XPath can be used as follow :

for c in cross_refs:
    if c.xpath('boolean(following-sibling::node()[1][self::cross-ref])'):
        print c.text

Or you can get only cross-ref elements that match this criteria in the first place, if you like :

cross_refs = tree.xpath('//cross-ref[following-sibling::node()[1][self::cross-ref]]')
for c in cross_refs:
    print c.text

Detect whether is the immediate sibling of an element is text or another element in lxml

Answers (2)

Related Questions