area9
area9

Reputation: 391

Xpath - obtaining 2 nodes with 1 node having default value if missing

I am using xpath in Python 2.7 with lxml:

from lxml import html
...
tree = html.fromstring(source)
results = tree.xpath(...xpath string...)

Now the problem is the xpath string and am getting quite lost in this. I am trying to get all the nodes from one path like this:

//a[@class="hyperlinkClass"]/span/text()    (1)

There are no missing entries in this part and this works fine. But I'm also trying to get a part relative to this as well, like so:

//a[@class="hyperlinkClass"]/span/following-sibling::div[@class="divClassName"]/span[@class="spanClassName"]/text()    (2)

This works fine by itself, but (2) may or may not have nodes for each node in (1). What I would like to do is to have a default value for if (2) is missing/empty for each (1), say "absent". This sounds straightforward and maybe it is, but I'm hitting a brick wall here.

By doing '(1) | (2)' I get all the values needed, but no way to match them. If I do '(1) | concat((2), "absent")', this doesn't work either - concat doesn't seem to work in python, though I've read with xpath that it is valid. I saw here the "Becker method", but that doesn't work either (or I can't get it to).

Hopefully, someone can shine a light on how to get this working or if it's even possible.

Upvotes: 1

Views: 280

Answers (1)

Tomalak
Tomalak

Reputation: 338278

Don't make this more complicated than it is:

path1 = '//a[@class="hyperlinkClass"]/span'
path2 = './following-sibling::div[@class="divClassName"]/span[@class="spanClassName"]'

for link in tree.xpath(path1):
    other_node = link.xpath(path2)
    if len(other_node):
        print(link.text, other_node[0].text)
    else:
        print(link.text, 'n/a')

Upvotes: 2

Related Questions