Reputation: 589
In Scrapy, I have my items specified in a certain order in items.py, & my spider has those items again in the same order. However, when I run the spider & save the results as a csv, the column order from the items.py or the spider is not maintained. How can I get the CSV to show columns in a specific order. Example code would be very appreciated.
Thanks.
Upvotes: 12
Views: 11435
Reputation: 121
I wouldn't know about the time you asked your question but Scrapy now provides a fields_to_export attribute to the BaseItemExporter class, from which CsvItemExporter inherits. As per version 0.22:
fields_to_export
A list with the name of the fields that will be exported, or None if you want to export all fields. Defaults to None.
Some exporters (like CsvItemExporter) respect the order of the fields defined in this attribute.
See also the documentation for BaseItemExporter and CsvItemExporter on the Scrapy website.
In order to use this feature, though, you will have to create your own ItemPipeline, as detailed in this answer
Upvotes: 6
Reputation: 189
This is related to Modifiying CSV export in scrapy
The problem is that the exporter is instantiated without any keyword parameters, so the keywords like EXPORT_FIELDS are ignored. The solution is the same: you need to subclass the CSV item exporter to pass the keyword parameters.
Following the above recipe, I created a new file xyzzy/feedexport.py (change "xyzzy" to whatever your scrapy class is named):
"""
The standard CSVItemExporter class does not pass the kwargs through to the
CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored
(EXPORT_EMPTY is not used by CSV).
"""
from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter
class CSVkwItemExporter(CsvItemExporter):
def __init__(self, *args, **kwargs):
kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')
super(CSVkwItemExporter, self).__init__(*args, **kwargs)
and then added it into xyzzy/settings.py:
FEED_EXPORTERS = {
'csv': 'xyzzy.feedexport.CSVkwItemExporter'
}
Now the CSV exporter will honor the EXPORT_FIELD setting - also add to xyzzy/settings.py:
# By specifying the fields to export, the CSV export honors the order
# rather than using a random order.
EXPORT_FIELDS = [
'field1',
'field2',
'field3',
]
Upvotes: 18