Reputation: 341
I'm trying out print out a CSV file after scraping using piplines but the formatting is a bit weird because instead of printing it top to bottom it is printing it all at once after scraping page 1 and then all of page 2 in one column. I have attached piplines.py and one line from csv output(quite large). So how do I make to print column wise instead all at once from one page
pipline.py
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter
class CSVPipeline(object):
def __init__(self):
self.files = {}
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
file = open('%s_items.csv' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = CsvItemExporter(file)
self.exporter.fields_to_export = ['names','stars','subjects','reviews']
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
and output.csv
names stars subjects
Vivek0388,NikhilVashisth,DocSharad,Abhimanyu_swarup,Suresh N,kaushalhkapadia,JyotiMallick,Nitin T,mhdMumbai,SunilTukrel(COLUMN 2) 5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(COLUMN 3) Best Stay,Awesome View... Nice Experience!,Highly mismanaged and dishonest.,A Wonderful Experience,Good place with average front office,Honeymoon,Awesome Resort,Amazing,ooty's beauty!!,Good stay and food
It should look something like this
Vivek0388 5 of 5
NikhilVashisth 5 of 5
DocSharad 5 of 5
...so on
EDIT:
items = [{'reviews:':"",'subjects:':"",'names:':"",'stars:':""} for k in range(1000)]
if(sites and len(sites) > 0):
for site in sites:
i+=1
items[i]['names'] = item['names']
items[i]['stars'] = item['stars']
items[i]['subjects'] = item['subjects']
items[i]['reviews'] = item['reviews']
yield Request(url="http://tripadvisor.in" + site, callback=self.parse)
for k in range(1000):
yield items[k]
Upvotes: 3
Views: 1646
Reputation: 341
Figured it out, csv zip it and then for loop it through it and write row. This was MUCH less complicated once you read the docs.
import csv
import itertools
class CSVPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
self.csvwriter.writerow(['names','starts','subjects','reviews'])
def process_item(self, item, ampa):
rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])
for row in rows:
self.csvwriter.writerow(row)
return item
Upvotes: 3