Reputation: 1
I have a HTML tree which looks like this:
<div id="RF4FOEQ3OPBEX" data-hook="review" class="a-section review aok-relative"><div
<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content reviewText review-text-content a-expander-partial-collapse-content">
<span>
Text line1.
<br>
Text line2.
</span>
I am trying to extract all the text from the span with the following XPath expression:
//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span/text()
However this approach only returns the first text line until the break? The question is: how would I approach this problem in the correct way in order to extract the full text content of the HTML span tag? I would appreciate any help very much and thank you in advance for the support.
Upvotes: 0
Views: 503
Reputation: 21241
use //
and getall
method to get all text inside specific element
getall
returns list, just join
it
txt = "".join(response.xpath('//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span//text()').getall())
Upvotes: 0