Extracting full text from HTML span element with XPath expression

I have a HTML tree which looks like this:

<div id="RF4FOEQ3OPBEX" data-hook="review" class="a-section review aok-relative"><div 
   <div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content reviewText review-text-content a-expander-partial-collapse-content">
      <span> 
             Text line1. 
             <br>
             Text line2. 
       </span>

I am trying to extract all the text from the span with the following XPath expression:

//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span/text()

However this approach only returns the first text line until the break? The question is: how would I approach this problem in the correct way in order to extract the full text content of the HTML span tag? I would appreciate any help very much and thank you in advance for the support.

Upvotes: 0

Views: 503

Answers (1)

Umair Ayub
Umair Ayub

Reputation: 21241

use // and getall method to get all text inside specific element

getall returns list, just join it

txt = "".join(response.xpath('//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span//text()').getall())

Upvotes: 0

Related Questions