Spaceman
Spaceman

Reputation: 1205

Override Scrapy output format 'on the fly'

I want to override spider's output format right in the code. I can't modify the settings; I can't change the command line. I want to do it right in the __init__ method.

Ideally, that new output format should work even if something like -o /tmp/1.csv is passed to the spider. But if it's not possible, then pass it.

How can I do that?

Thank you.

Upvotes: 4

Views: 507

Answers (1)

Elias Dorneles
Elias Dorneles

Reputation: 23856

So, you can put a custom attribute in your spider that sets up how the data should be handled for this spider and create a Scrapy item pipeline that honors that configuration.

Your spider code would look like:

from scrapy import Spider


class MySpider(Spider):
    def __init__(self, *args, **kwargs):
        super(MySpider, self).__init(*args, **kwargs)
        self.data_destination = self._get_data_destination()

    def _get_data_destination(self):
         # return your dynamically discovered data destination settings here

And your item pipeline would be something like:

class MySuperDuperPipeline(object):
    def process_item(self, item, spider):
        data_destination = getattr(spider, 'data_destination')

        # code to handle item conforming to data destination here...

        return item

Upvotes: 1

Related Questions