Reputation: 341
I am trying to write to CSV but when I check the output, I see that some 'review' fields are left blank even though there when I see the output it prints it correctly. I believe this is a zip()
limitation as I am using that to have it print column wise rather than 10 in a row. Again the Xpath output i print in spider outputs it correctly. Im wondering is it a limitaion of zip or my syntax? Or another guess it is maybe the delimeter=','
.
Pipline.py
import csv
import itertools
from string import maketrans
class CSVPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('Output.csv', 'wb'),delimiter=',')
self.csvwriter.writerow(['names','date','location','starts','subjects','reviews'])
def process_item(self, item, ampa):
rows = zip(item['names'],item['date'],item['location'],item['stars'],item['subjects'],item['reviews'])
for row in rows:
self.csvwriter.writerow(row)
return item
Sample output, some reviews get excluded
names,date,location,starts,subjects,reviews
Aastha2015,20 July 2015,"
Bengaluru (Bangalore), India
",5,Amazing Time in Ooty,"
Hi All, i visited Ooty on July 10th, choose to stay in Elk Hills hotel, i read reviews of almost all good hotels and decided to try Elk Hills. I must say the property is huge, very well maintained. Rooms are clean spacious & views are great. Food in the Cafe Blue was awesome. They forgot to give us the...
"
pushp2015,11 July 2015,"
Gurgaon, India
",3,Nice Hotel ...under going maintainance,"
"
REDDY84,25 June 2015,"
Chennai, India
",4,Good old property,"
Its an old property with a very good view. We booked a suite at a very reasonable price but they charged for an extra bed 1500 + txs which i feel was not required because the bed was already their in the suite room.Other then that everything was good. Breakfast was nice . The room they had given was neat...
"
arun606,20 June 2015,"
Mumbai, India
",5,Amazing Hospitality,"
"
Upvotes: 1
Views: 85
Reputation: 341
Figured it out, As @Martin Evans stated I checked the lengths and found that There were many carriage returns that would simply put a blank space. i dont know why but it does. To fix it just add this code.
while "\n" in yourlist['key']: yourlist['key'].remove("\n")
Upvotes: 0
Reputation: 3420
I'm not sure but I think what you call a limitation is more the zip
way of working.
Check out izip_longest
which will not stop at the shortest list.
Example:
>>> zip('abc', '12345')
[('a', '1'), ('b', '2'), ('c', '3')]
>>> list(itertools.izip_longest('abc', '12345', fillvalue=0))
[('a', '1'), ('b', '2'), ('c', '3'), (0, '4'), (0, '5')]
Upvotes: 1