Adrian
Adrian

Reputation: 907

divide list of elements in scrapy output into seperate rows

I am trying to separate the output from Scrapy into separate lines in an Excel file but I get something like this enter image description here

In other words each output from variant id, price and name should be in placed in seperate lines in Excel.

I am using scrapy-xlsx 0.1.1 library to export output to xlsx file (it cannot be in csv).

Please tell me where is the issue.

import scrapy
from ..items import ZooplusItem
import re
class ZooplusDeSpider(scrapy.Spider):
name = 'zooplus_de'
allowed_domains = ['zooplus.de']
start_urls = ['https://www.zooplus.de/shop/hunde/hundefutter_trockenfutter/diaetfutter']

def parse(self, response):
    for link in response.css('.MuiGrid-root.MuiGrid-container.MuiGrid-spacing-xs-2.MuiGrid-justify-xs-flex-end'):
        items = ZooplusItem()
        redirect_urls = response.request.meta.get('redirect_urls')
        items['url'] = link.redirect_urls[0] if redirect_urls else response.request.url
        items['product_url'] = link.css('.MuiGrid-root.product-image a::attr(href)').getall()
        items['title'] = link.css('h3 a::text').getall()
        items['id'] = link.css('h3 a::attr(id)').getall()

        items['review'] = link.css('span.sc-fzoaKM.kVcaXm::text').getall()
        items['review'] = re.sub(r'\D', " ", str(items['review']))
        items['review'] = items['review'].replace(" ", "")
        #items['review'] = int(items['review'])

        items['rate'] = len(link.css('a.v3-link i[role=full-star]'))
        items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
        items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
        items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

        yield items

Upvotes: 0

Views: 96

Answers (1)

tomjn
tomjn

Reputation: 5389

If you want to store all the variants with common information duplicated, then you need to loop through each variant and yield that separately. You can copy the common information you've already collected and add to that.

In summary replace

items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

yield item

with something like

for i in link.css("[data-zta='product-variant']"):
    variant = items.copy()
    variant["variant_id"] = i.attrib["data-variant-id"]
    variant["variant_name"] = "".join(i.css(".title > div::text").getall()).strip()
    variant['variant_price'] = i.css("[itemprop='price']::attr(content)").get()
 
    yield variant

Upvotes: 1

Related Questions