Reputation: 2927
I am trying to retrieve an article number and some other data with the help of Xpath, where the ID is within an div tag surrounded by other HTML tags and text:
<div class="description">
<span class="product-name"></span><br>
details<br>
company<br>
Art.-Nr. (article): 1686382
<div class="product-icons"></div>
</div>
My Xpath looks like this
>>> response.xpath('//div[@id="product-list"]/div[1]/form/div[2]/div[2]').extract_first()
response:
'<div class="description">\n<span class="product-name"><b><a href="/gurgelloesungen-tropfen/salviathymol-n-madaus-p11548439">Salviathymol N Madaus</a></b></span><br>\nTropfen, 100 Milliliter, N3<br>\nMEDA Pharma GmbH & Co. KG<br>\nArt.-Nr. (PZN): 11548439\n<div class="product-icons">\n<div class="rating"><a href="/gurgelloesungen-tropfen/salviathymol-n-madaus-p11548439#reviews" class="sp2p sp-star sp-star-5"></a><span>(<a href="/gurgelloesungen-tropfen/salviathymol-n-madaus-p11548439#reviews">13</a>)</span></div>\n</div>\n</div>'
How can I retrieve the three lines of data (details, company, article no)?
Upvotes: 0
Views: 24
Reputation: 14145
You current code will return the node
rather the text
. If you have to get the text then you have to point to the text
node using text()
.
That's the reason why your below line of code extracted the text.
response.xpath('//div[@id="product-list"]/div[1]/form/div[2]/div[2]/br[3]//following-sibling::text()').extract_first()
Upvotes: 1