Reputation: 13525
I am studying the Scrapy tutorial. To test the process I created a new project with these files:
See my post in Scrapy group for links to scripts, I cannot post more than 1 link here.
The spider runs well and scrapes the text between title tags and puts it in FirmItem
[whitecase.com] INFO: Passed FirmItem(title=[u'White & Case LLP - Lawyers - Rachel B. Wagner '])
But I am stuck in the pipeline process. I want to add this FirmItem into a csv file so that I can add it to the database.
I am new to python and I am learning as I go along. I would appreciate if someone gave me a clue about how to make the pipelines.py work so that the scraped data is put into items.csv.
Thank you.
Upvotes: 2
Views: 3859
Reputation: 1370
Use the built-in CSV feed export (available in v0.10) together with the CsvItemExporter.
Upvotes: 1
Reputation: 148
I think they address your specific question in the Scrapy Tutorial.
It suggest, as others have here using the CSV module. Place the following in your pipelines.py
file.
import csv
class CsvWriterPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('items.csv', 'wb'))
def process_item(self, domain, item):
self.csvwriter.writerow([item['title'][0], item['link'][0], item['desc'][0]])
return item
Don’t forget to enable the pipeline by adding it to the ITEM_PIPELINES setting in your settings.py, like this:
ITEM_PIPELINES = ['dmoz.pipelines.CsvWriterPipeline']
Adjust to suit the specifics of your project.
Upvotes: 9
Reputation: 5338
Open file and write to it.
f = open('my.cvs','w')
f.write('h1\th2\th3\n')
f.write(my_class.v1+'\t'+my_class.v2+'\t'+my_class.v3+'\n')
f.close()
Or output your results on stdout and then redirect stdout to file ./my_script.py >> res.txt
Upvotes: -1
Reputation: 11252
Python has a module for reading/writing CSV files, this is safer than writing the output yourself (and getting all quoting/escaping right...)
import csv
csvfile = csv.writer(open('items.csv', 'w'))
csvfile.writerow([ firmitem.title, firmitem.url ])
csvfile.close()
Upvotes: 0