jkupczak
jkupczak

Reputation: 3021

How do I export to csv when using Scrapy?

I still consider myself new to Python in general, so please bear with me on this! I'm attempting to use Scrapy to gather some data from websites. Once I've collected the data I'd like it to exported to a CSV file. So far my attempts with the following code have resulted in files that aren't setup as tables at all.

My export code:

scrapy crawl products -o myinfo.csv -t csv

I concluded that I need to write some sort of pipeline that will define what my column headers are. To the best of my ability that meant writing the following code in the following two files.

pipelines.py

class AllenheathPipeline(object):
    def process_item(self, item, spider):
        return item


from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter

class AllenHeathCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        delimiter = settings.get('CSV_DELIMITER', ',')
        kwargs['delimiter'] = delimiter

        fields_to_export = settings.get('FIELDS_TO_EXPORT', [])
        if fields_to_export :
            kwargs['fields_to_export'] = fields_to_export

        super(AllenHeathCsvItemExporter, self).__init__(*args, **kwargs)

settings.py

BOT_NAME = 'allenheath'

SPIDER_MODULES = ['allenheath.spiders']
NEWSPIDER_MODULE = 'allenheath.spiders'

ITEM_PIPELINES = {
    'allenheath.pipelines.AllenheathPipeline': 300,
    'allenheath.pipelines.AllenHeathCsvItemExporter': 800,
}

FEED_EXPORTERS = {
    'csv': 'allenheath.allen_heath_csv_item_exporter.AllenHeathCsvItemExporter',
}
FIELDS_TO_EXPORT = [
    'model',
    'shortdesc',
    'desc',
    'series'
]

CSV_DELIMITER = "\t" # For tab

Unfortunately, once I run the export command again:

scrapy crawl products -o myinfo.csv -t csv

I get this error:

File "C:\allenheath\allenheath\pipelines.py", line 27, in __init__
  super(AllenHeathCsvItemExporter, self).__init__(*args, **kwargs)
TypeError: __init__() takes at least 2 arguments (1 given)

Any help or guidance would be greatly appreciated as I've hit a brick wall here. Thank you.

Upvotes: 1

Views: 2266

Answers (1)

Elias Dorneles
Elias Dorneles

Reputation: 23796

You don't need to define a pipeline for exporting to CSV.

Scrapy handle that automagically, the information about the headers is taken from your Item definition.

Just drop the pipeline and try again. Btw, the -t csv is optional in latest Scrapy versions: the target format is infered from the filename extension.

Upvotes: 2

Related Questions