sheshkovsky
sheshkovsky

Reputation: 1340

Detect whether is the immediate sibling of an element is text or another element in lxml

I'm using lxml library and Python 2.7 to parse xml files. I need to detect if sibling elements which don't have any text between them. For example in following xml portion:

<cross-ref> [t1] </cross-ref> ***some text*** <cross-ref> [t2] </cross-ref>  
<cross-ref> [t3] </cross-ref><cross-ref> [t4] </cross-ref>

saying that I detect all elements with cross-ref tags, I need a way which only detects cross-ref elements on second line which second element comes after the first one with no text between them. So I guess something as folloing for loop is needed but obviously this code prints both [t1] and [t3]

for c in cross_refs:
  # detect ***some text*** or do something else here
  if c.getnext().tag == "cross-ref":
     print c.text

I need to modify it so the output will be only [t3].

Upvotes: 1

Views: 491

Answers (2)

har07
har07

Reputation: 89325

The triple nested ifs in your answer can also be expressed in XPath as follow :

following-sibling::node()[1][self::cross-ref]

In short, the XPath returns the nearest following sibling node only if it is cross-ref element. Notice that node here means either text node or element node. The XPath can be used as follow :

for c in cross_refs:
    if c.xpath('boolean(following-sibling::node()[1][self::cross-ref])'):
        print c.text

Or you can get only cross-ref elements that match this criteria in the first place, if you like :

cross_refs = tree.xpath('//cross-ref[following-sibling::node()[1][self::cross-ref]]')
for c in cross_refs:
    print c.text

Upvotes: 2

sheshkovsky
sheshkovsky

Reputation: 1340

I solved the problem using tail property. When c.tail is None, then I can say that to elements are attached with no text between them. The code is like this:

for c in cross_refs:
    if c.getnext() != None:
        if c.getnext().tag == c.tag:
            if c.tail == None:
                print c.text

Upvotes: 0

Related Questions