Reputation: 2917
I want to save all download images of a crawl in a specific folder, so I can run multiple spiders within the same project at the same time without having all images of multiple crawlw in one folder.
The img folder destination is defined within settings:
project_dir = os.path.dirname(__file__)+'/../' #<-- absolute dir the script is in
IMAGES_STORE = project_dir+"images"
My spider has a class like this within spidername.py:
class GetbidSpider(CrawlSpider):
name = 'test_spider'
My image pipeline looks like this:
class MyImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
How can I access the name of the current spider within settings in order to create a dynamic image download folder?
Upvotes: 0
Views: 684
Reputation: 18799
One way would be to override the ImagesPipeline
, being more specific would be the image_downloaded
method, so you can do whatever you want with what you got from the crawler.
Now, I assume you want to change that settings variable each time you run the spider, so you don't have to go around and change the settings each time before running it.
An alternative to change settings on each run, would be to pass it as a crawl argument:
scrapy crawl test_spider -s IMAGES_STORE=test_spider
Another way would be to set it on custom_settings
for each spider in your code:
class GetbidSpider(CrawlSpider):
name = 'test_spider'
custom_settings = {
'IMAGES_STORE': 'test_spider',
}
and just run your spider normally:
scrapy crawl test_spider
Upvotes: 2