merlin
merlin

Reputation: 2917

How to access spider name within settings.py in scrapy

I want to save all download images of a crawl in a specific folder, so I can run multiple spiders within the same project at the same time without having all images of multiple crawlw in one folder.

The img folder destination is defined within settings:

project_dir = os.path.dirname(__file__)+'/../' #<-- absolute dir the script is in
IMAGES_STORE = project_dir+"images"

My spider has a class like this within spidername.py:

class GetbidSpider(CrawlSpider):
    name = 'test_spider'

My image pipeline looks like this:

class MyImagesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url)

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item['image_paths'] = image_paths
        return item

How can I access the name of the current spider within settings in order to create a dynamic image download folder?

Upvotes: 0

Views: 684

Answers (1)

eLRuLL
eLRuLL

Reputation: 18799

One way would be to override the ImagesPipeline, being more specific would be the image_downloaded method, so you can do whatever you want with what you got from the crawler.

Now, I assume you want to change that settings variable each time you run the spider, so you don't have to go around and change the settings each time before running it.

An alternative to change settings on each run, would be to pass it as a crawl argument:

scrapy crawl test_spider -s IMAGES_STORE=test_spider

Another way would be to set it on custom_settings for each spider in your code:

class GetbidSpider(CrawlSpider):
    name = 'test_spider'

    custom_settings = {
        'IMAGES_STORE': 'test_spider',
    }

and just run your spider normally:

scrapy crawl test_spider

Upvotes: 2

Related Questions