Smashed
Smashed

Reputation: 341

Exporting to CSV format incorrect in scrapy

I'm trying out print out a CSV file after scraping using piplines but the formatting is a bit weird because instead of printing it top to bottom it is printing it all at once after scraping page 1 and then all of page 2 in one column. I have attached piplines.py and one line from csv output(quite large). So how do I make to print column wise instead all at once from one page

pipline.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter

class CSVPipeline(object):

    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline


    def spider_opened(self, spider):
        file = open('%s_items.csv' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = CsvItemExporter(file)
        self.exporter.fields_to_export = ['names','stars','subjects','reviews']
        self.exporter.start_exporting()

    def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()


    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

and output.csv

names   stars   subjects
Vivek0388,NikhilVashisth,DocSharad,Abhimanyu_swarup,Suresh N,kaushalhkapadia,JyotiMallick,Nitin T,mhdMumbai,SunilTukrel(COLUMN 2)   5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(COLUMN 3) Best Stay,Awesome View... Nice Experience!,Highly mismanaged and dishonest.,A Wonderful Experience,Good place with average front office,Honeymoon,Awesome Resort,Amazing,ooty's beauty!!,Good stay and food

It should look something like this

Vivek0388      5 of 5
NikhilVashisth 5 of 5
DocSharad      5 of 5
...so on

EDIT:

items = [{'reviews:':"",'subjects:':"",'names:':"",'stars:':""} for k in range(1000)]
if(sites and len(sites) > 0):
    for site in sites:
        i+=1
        items[i]['names'] = item['names']
        items[i]['stars'] = item['stars']
        items[i]['subjects'] = item['subjects']
        items[i]['reviews'] = item['reviews']
        yield Request(url="http://tripadvisor.in" + site, callback=self.parse)
    for k in  range(1000):
        yield items[k]

Upvotes: 3

Views: 1646

Answers (1)

Smashed
Smashed

Reputation: 341

Figured it out, csv zip it and then for loop it through it and write row. This was MUCH less complicated once you read the docs.

import csv
import itertools

class CSVPipeline(object):

   def __init__(self):
      self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
      self.csvwriter.writerow(['names','starts','subjects','reviews'])

   def process_item(self, item, ampa):

      rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])


      for row in rows:
         self.csvwriter.writerow(row)

      return item

Upvotes: 3

Related Questions