Reputation: 391
I am using xpath in Python 2.7 with lxml:
from lxml import html
...
tree = html.fromstring(source)
results = tree.xpath(...xpath string...)
Now the problem is the xpath string and am getting quite lost in this. I am trying to get all the nodes from one path like this:
//a[@class="hyperlinkClass"]/span/text() (1)
There are no missing entries in this part and this works fine. But I'm also trying to get a part relative to this as well, like so:
//a[@class="hyperlinkClass"]/span/following-sibling::div[@class="divClassName"]/span[@class="spanClassName"]/text() (2)
This works fine by itself, but (2) may or may not have nodes for each node in (1). What I would like to do is to have a default value for if (2) is missing/empty for each (1), say "absent". This sounds straightforward and maybe it is, but I'm hitting a brick wall here.
By doing '(1) | (2)' I get all the values needed, but no way to match them. If I do '(1) | concat((2), "absent")', this doesn't work either - concat doesn't seem to work in python, though I've read with xpath that it is valid. I saw here the "Becker method", but that doesn't work either (or I can't get it to).
Hopefully, someone can shine a light on how to get this working or if it's even possible.
Upvotes: 1
Views: 280
Reputation: 338278
Don't make this more complicated than it is:
path1 = '//a[@class="hyperlinkClass"]/span'
path2 = './following-sibling::div[@class="divClassName"]/span[@class="spanClassName"]'
for link in tree.xpath(path1):
other_node = link.xpath(path2)
if len(other_node):
print(link.text, other_node[0].text)
else:
print(link.text, 'n/a')
Upvotes: 2