divide list of elements in scrapy output into seperate rows

Question

I am trying to separate the output from Scrapy into separate lines in an Excel file but I get something like this

In other words each output from variant id, price and name should be in placed in seperate lines in Excel.

I am using scrapy-xlsx 0.1.1 library to export output to xlsx file (it cannot be in csv).

Please tell me where is the issue.

import scrapy
from ..items import ZooplusItem
import re
class ZooplusDeSpider(scrapy.Spider):
name = 'zooplus_de'
allowed_domains = ['zooplus.de']
start_urls = ['https://www.zooplus.de/shop/hunde/hundefutter_trockenfutter/diaetfutter']

def parse(self, response):
    for link in response.css('.MuiGrid-root.MuiGrid-container.MuiGrid-spacing-xs-2.MuiGrid-justify-xs-flex-end'):
        items = ZooplusItem()
        redirect_urls = response.request.meta.get('redirect_urls')
        items['url'] = link.redirect_urls[0] if redirect_urls else response.request.url
        items['product_url'] = link.css('.MuiGrid-root.product-image a::attr(href)').getall()
        items['title'] = link.css('h3 a::text').getall()
        items['id'] = link.css('h3 a::attr(id)').getall()

        items['review'] = link.css('span.sc-fzoaKM.kVcaXm::text').getall()
        items['review'] = re.sub(r'\D', " ", str(items['review']))
        items['review'] = items['review'].replace(" ", "")
        #items['review'] = int(items['review'])

        items['rate'] = len(link.css('a.v3-link i[role=full-star]'))
        items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
        items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
        items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

        yield items

tomjn · Accepted Answer

If you want to store all the variants with common information duplicated, then you need to loop through each variant and yield that separately. You can copy the common information you've already collected and add to that.

In summary replace

items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

yield item

with something like

for i in link.css("[data-zta='product-variant']"):
    variant = items.copy()
    variant["variant_id"] = i.attrib["data-variant-id"]
    variant["variant_name"] = "".join(i.css(".title > div::text").getall()).strip()
    variant['variant_price'] = i.css("[itemprop='price']::attr(content)").get()
 
    yield variant

divide list of elements in scrapy output into seperate rows

Answers (1)

Related Questions