tags screws up my data using scrapy and python

Question

I am trying to scrape the text of the reviews on Amazon using scrapy. The problem is that when a review consists of multiple enters, the text in a span element is separated by tags. So, when I want to scrape the first review I use this line of code:

response.css('span.a-size-base.review-text::text').extract_first()

This does not give me all the text of the review, but only the text between the element and the first element.

I know that when I replace "extract_first()" by "extract()", I will get all the text. However, this also gives me the text of the other reviews.

So basically, the extract() method returns an array with the elements being separated by tags. I need it to be separated by the tags.

Is there a way to scrape all text between the open element and the closing element?

example of HTML code:

< span data-hook="review-body" class="a-size-base review-text">
    "I like this product, the reasons why are explained below"
    < br >
    < br >
    "1. It looks nice" 
    < br >
    "2. I love it"
< /span >

What it looks like on the site:

I like this product, the reasons why are explained below

It looks nice
I love it

Output I will get using extract_first():

"I like this product, the reasons why are explained below"

Output I will get using extract() (note that it consists of three elements):

"I like this product, the reasons why are explained below", "1. It looks nice", "2. I love it"

Output I want to get (only one element, the review itself):

"I like this product, the reasons why are explained below 1. It looks nice 2. I love it"

<br> tags screws up my data using scrapy and python

Answers (1)

Related Questions

&lt;br&gt; tags screws up my data using scrapy and python

Answers (1)

Related Questions

<br> tags screws up my data using scrapy and python