Zeynel
Zeynel

Reputation: 13525

Newbie Q about Scrapy pipeline.py

I am studying the Scrapy tutorial. To test the process I created a new project with these files:

See my post in Scrapy group for links to scripts, I cannot post more than 1 link here.

The spider runs well and scrapes the text between title tags and puts it in FirmItem

[whitecase.com] INFO: Passed FirmItem(title=[u'White & Case LLP - Lawyers - Rachel B. Wagner ']) 

But I am stuck in the pipeline process. I want to add this FirmItem into a csv file so that I can add it to the database.

I am new to python and I am learning as I go along. I would appreciate if someone gave me a clue about how to make the pipelines.py work so that the scraped data is put into items.csv.

Thank you.

Upvotes: 2

Views: 3859

Answers (4)

Daniel Werner
Daniel Werner

Reputation: 1370

Use the built-in CSV feed export (available in v0.10) together with the CsvItemExporter.

Upvotes: 1

leeo
leeo

Reputation: 148

I think they address your specific question in the Scrapy Tutorial.

It suggest, as others have here using the CSV module. Place the following in your pipelines.py file.

import csv

class CsvWriterPipeline(object):

    def __init__(self):
        self.csvwriter = csv.writer(open('items.csv', 'wb'))

    def process_item(self, domain, item):
        self.csvwriter.writerow([item['title'][0], item['link'][0], item['desc'][0]])
        return item

Don’t forget to enable the pipeline by adding it to the ITEM_PIPELINES setting in your settings.py, like this:

ITEM_PIPELINES = ['dmoz.pipelines.CsvWriterPipeline']

Adjust to suit the specifics of your project.

Upvotes: 9

Elalfer
Elalfer

Reputation: 5338

Open file and write to it.

f = open('my.cvs','w')
f.write('h1\th2\th3\n')
f.write(my_class.v1+'\t'+my_class.v2+'\t'+my_class.v3+'\n')
f.close()

Or output your results on stdout and then redirect stdout to file ./my_script.py >> res.txt

Upvotes: -1

Wim
Wim

Reputation: 11252

Python has a module for reading/writing CSV files, this is safer than writing the output yourself (and getting all quoting/escaping right...)

import csv
csvfile = csv.writer(open('items.csv', 'w'))
csvfile.writerow([ firmitem.title, firmitem.url ])
csvfile.close()

Upvotes: 0

Related Questions