Reputation: 1690
I'm creating a csv file with my spider but it gives me a weird order of data:
My code:
class GoodmanSpider(scrapy.Spider):
name = "goodmans"
start_urls = ['http://www.goodmans.net/d/1706/brands.htm']
def parse(self, response):
items = TutorialItem()
all_data = response.css('.SubDepartments')
for data in all_data:
category = data.css('.SubDepartments a::text').extract()
category_url = data.css('.SubDepartments a::attr(href)').extract()
items['category'] = category
items['category_url'] = category_url
yield items
My items.py file
The output I want, more or less:
Upvotes: 0
Views: 109
Reputation: 1690
This is the code correction, based on Michael's answer. Works perfectly
import scrapy
from ..items import TutorialItem
import pandas as pd
class GoodmanSpider(scrapy.Spider):
name = "goodmans"
start_urls = ['http://www.goodmans.net/d/1706/brands.htm']
def parse(self, response):
items = TutorialItem()
all_data = response.css('.SubDepartments')
for data in all_data:
category = data.css('.SubDepartments a::text').extract()
category_url = data.css('.SubDepartments a::attr(href)').extract()
items['category'] = category
items['category_url'] = category_url
for cat, url in zip(category, category_url):
item = dict(category=cat, category_url=url)
yield item
Upvotes: 0
Reputation: 1445
You have stacked all your items in a single one. Each item should be a dict of single value for each key, while you're having a list.
Try something like:
for cat, url in zip(category, category_url):
item = dict(category=cat, category_url=url)
yield item
Upvotes: 1