Scrapy's Custom CSV headers for CsvItemExporter

Question

I'm trying to parse and convert XML to CSV. The tricky part is that headers should exactly match terms specified in the documentation of 3rd party CSV parser and it contains spaces between words, i.e. "Item title", "Item description", etc.

Since Items are defined as variables in items.py, I'm can't create Items containing spaces, i.e.

Item title = scrapy.Field()

I've tried adding to settings.py:

FEED_EXPORT_FIELDS = ["Item title", "Item description"]

It edits CVS headers, but after this it doesn't match Items anymore so it doesn't populated any data into .csv.

    class MySpider(XMLFeedSpider):
        name = 'example'
        allowed_domains = ['example.com']
        start_urls = ['http://example.com/feed.xml']
        itertag = 'item'

        def parse_node(self, response, node):
            item = FeedItem()
            item['id'] = node.xpath('//*[name()="g:id"]/text()').get()
            item['title'] = node.xpath('//*[name()="g:title"]/text()').get()
            item['description'] = node.xpath('//*[name()="g:description"]/text()').get()

            return item

Parser works fine, I get all the data I need. The issue is just with csv headers.

Is there a way to easily add customized headers that doesn't match names of Items and can contain few words?

Output I currently get:

id, title, description
12345, Lorem Ipsum, Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
12346, Quick Fox, The quick brown fox jumps over the lazy dog.

Desired output should look like this:

ID, Item title, Item description
12345, Lorem Ipsum, Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
12346, Quick Fox, The quick brown fox jumps over the lazy dog.

Input for testing:



  Example
  http://www.example.com
  Description of Example.com
        
            12345
            Lorem Ipsum
            Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
        
        
            12346
            Quick Fox
            The quick brown fox jumps over the lazy dog.

And this is the content of items.py:

import scrapy

class FeedItem(scrapy.Item):
    id = scrapy.Field()
    title = scrapy.Field()
    description = scrapy.Field()
    pass

Granitosaurus · Accepted Answer

You can make your own csv exporter! Ideally you can just extend the current exporter with a different method:

# exporters.py 
from scrapy.exporters import CsvItemExporter

class MyCsvItemExporter(CsvItemExporter):
    header_map = {
        'description': 'Item Description',
    }

    def _write_headers_and_set_fields_to_export(self, item):
        if not self.include_headers_line:
            return
        # this is the parent logic taken from parent class
        if not self.fields_to_export:
            if isinstance(item, dict):
                # for dicts try using fields of the first item
                self.fields_to_export = list(item.keys())
            else:
                # use fields declared in Item
                self.fields_to_export = list(item.fields.keys())
        headers = list(self._build_row(self.fields_to_export))

        # here we add our own extra mapping
        # map headers to our value
        headers = [self.header_map.get(header, header) for header in headers]
        self.csv_writer.writerow(headers)

And then activate it in your settings:

FEED_EXPORTERS = {
    'csv': 'myproject.exporters.MyCsvItemExporter',
}

Scrapy's Custom CSV headers for CsvItemExporter

Answers (2)

Related Questions

Scrapy&#39;s Custom CSV headers for CsvItemExporter

Answers (2)

Related Questions

Scrapy's Custom CSV headers for CsvItemExporter