Reputation: 3100
I am trying to extract price information from following two pages:
http://jujumarts.com/mobiles-accessories-smartphones-wildfire-sdarkgrey-p-551.html http://jujumarts.com/computers-accessories-transcend-500gb-portable-storejet-25d2-p-2616.html
xpath1 = //span[@class='productSpecialPrice']//text()
xpath2 = //div[@class='proDetPrice']//text()
As of now I have written python code, which returns the result of xpath1 if it is successful otherwise executes the second one. I have a feeling that it is possible to implement this logic in xpath alone, can someone tell me how?
Upvotes: 0
Views: 1628
Reputation: 879073
Use |
to indicate union
:
xpath3 = "//span[@class='productSpecialPrice']//text()|//div[@class='proDetPrice']//text()"
This is not exactly what you asked for, but I think it could be incorporated in a workable solution.
From the XPath (version 1.0) specs:
The | operator computes the union of its operands, which must be node-sets.
For example,
import lxml.html as LH
urls = [
'http://jujumarts.com/mobiles-accessories-smartphones-wildfire-sdarkgrey-p-551.html',
'http://jujumarts.com/computers-accessories-transcend-500gb-portable-storejet-25d2-p-2616.html'
]
xpaths = [
"//span[@class='productSpecialPrice']//text()",
"//div[@class='proDetPrice']//text()",
"//span[@class='productSpecialPrice']//text()|//div[@class='proDetPrice']//text()"
]
for url in urls:
doc = LH.parse(url)
for xpath in xpaths:
print(doc.xpath(xpath))
print
yields
['Rs.11,800.00']
['Rs.13,299.00', 'Rs.11,800.00']
['Rs.13,299.00', 'Rs.11,800.00']
[]
['Rs.7,000.00']
['Rs.7,000.00']
Another way to get at the information you want is
"//*[@class='productSpecialPrice' or @class='proDetPrice']//text()"
Upvotes: 4