merlin
merlin

Reputation: 2927

How to get intext info between tags with Xpath?

I am trying to retrieve an article number and some other data with the help of Xpath, where the ID is within an div tag surrounded by other HTML tags and text:

<div class="description">
    <span class="product-name"></span><br>
    details<br>
    company<br>
    Art.-Nr. (article): 1686382
    <div class="product-icons"></div>
</div>

My Xpath looks like this

>>> response.xpath('//div[@id="product-list"]/div[1]/form/div[2]/div[2]').extract_first()

response:

'<div class="description">\n<span class="product-name"><b><a href="/gurgelloesungen-tropfen/salviathymol-n-madaus-p11548439">Salviathymol N Madaus</a></b></span><br>\nTropfen, 100 Milliliter, N3<br>\nMEDA Pharma GmbH &amp; Co. KG<br>\nArt.-Nr. (PZN): 11548439\n<div class="product-icons">\n<div class="rating"><a href="/gurgelloesungen-tropfen/salviathymol-n-madaus-p11548439#reviews" class="sp2p sp-star sp-star-5"></a><span>(<a href="/gurgelloesungen-tropfen/salviathymol-n-madaus-p11548439#reviews">13</a>)</span></div>\n</div>\n</div>'

How can I retrieve the three lines of data (details, company, article no)?

Upvotes: 0

Views: 24

Answers (1)

supputuri
supputuri

Reputation: 14145

You current code will return the node rather the text. If you have to get the text then you have to point to the text node using text().

That's the reason why your below line of code extracted the text.

response.xpath('//div[@id="product-list"]/div[1]/form/div[2]/div[2]/br[3]//following-sibling::text()').extract_first()

Upvotes: 1

Related Questions