Reputation: 3021
I still consider myself new to Python in general, so please bear with me on this! I'm attempting to use Scrapy to gather some data from websites. Once I've collected the data I'd like it to exported to a CSV file. So far my attempts with the following code have resulted in files that aren't setup as tables at all.
My export code:
scrapy crawl products -o myinfo.csv -t csv
I concluded that I need to write some sort of pipeline that will define what my column headers are. To the best of my ability that meant writing the following code in the following two files.
pipelines.py
class AllenheathPipeline(object):
def process_item(self, item, spider):
return item
from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter
class AllenHeathCsvItemExporter(CsvItemExporter):
def __init__(self, *args, **kwargs):
delimiter = settings.get('CSV_DELIMITER', ',')
kwargs['delimiter'] = delimiter
fields_to_export = settings.get('FIELDS_TO_EXPORT', [])
if fields_to_export :
kwargs['fields_to_export'] = fields_to_export
super(AllenHeathCsvItemExporter, self).__init__(*args, **kwargs)
settings.py
BOT_NAME = 'allenheath'
SPIDER_MODULES = ['allenheath.spiders']
NEWSPIDER_MODULE = 'allenheath.spiders'
ITEM_PIPELINES = {
'allenheath.pipelines.AllenheathPipeline': 300,
'allenheath.pipelines.AllenHeathCsvItemExporter': 800,
}
FEED_EXPORTERS = {
'csv': 'allenheath.allen_heath_csv_item_exporter.AllenHeathCsvItemExporter',
}
FIELDS_TO_EXPORT = [
'model',
'shortdesc',
'desc',
'series'
]
CSV_DELIMITER = "\t" # For tab
Unfortunately, once I run the export command again:
scrapy crawl products -o myinfo.csv -t csv
I get this error:
File "C:\allenheath\allenheath\pipelines.py", line 27, in __init__
super(AllenHeathCsvItemExporter, self).__init__(*args, **kwargs)
TypeError: __init__() takes at least 2 arguments (1 given)
Any help or guidance would be greatly appreciated as I've hit a brick wall here. Thank you.
Upvotes: 1
Views: 2266
Reputation: 23796
You don't need to define a pipeline for exporting to CSV.
Scrapy handle that automagically, the information about the headers is taken from your Item definition.
Just drop the pipeline and try again. Btw, the -t csv
is optional in latest Scrapy versions: the target format is infered from the filename extension.
Upvotes: 2