Dodge_X
Dodge_X

Reputation: 93

Scrapy cannot write a JSON file in the same path started by Scrapyd

I use Scrapy to get a website info, and then write the information to a JSON file.

It works correctly started by Scrapy itself, but when I started it by Scrapyd, I found that the JSON file was not created in the same path.

import math
from typing import Any
import scrapy
from scrapy.http import Response
import json
from scrapy.utils.conf import closest_scrapy_cfg
import os



    def __init__(self, start_urls=None, *args, **kwargs):
        super(NsidcInfoSpider, self).__init__(*args, **kwargs)
        # XPath namespaces
        self.namespaces = {
           ...
        }
        
        # get the store dir
        proj_root = closest_scrapy_cfg()
        if proj_root:
            proj_root = os.path.dirname(proj_root)
        proj_root = proj_root + "\\files\\info"
        if not os.path.exists(proj_root):
            os.makedirs(proj_root)
            
        self.file = open(proj_root + '\\entries.json', 'a')     

    def parse(self, response):
       # get the website information and parse it to obj
       obj = {}
       json_string = json.dumps(obj)
       self.file.write(json_string + '\n')
        

    def closed(self, reason):
        self.file.close()

Upvotes: 0

Views: 37

Answers (1)

Dodge_X
Dodge_X

Reputation: 93

The closest_scrapy_cfg() function in Scrapy is used to locate the nearest scrapy.cfg configuration file. So, when I used scrapyd to dispatch the spider, the JSON file would not be written in the scrapy project dir.

To resolve this, I opted to write the JSON file using the FILES_STORE setting, allowing me to specify an absolute path for the file.

        settings = get_project_settings()  
        file_store = settings.get('FILES_STORE')
        if file_store:
            file_store = os.path.dirname(file_store)
        file_store = file_store + "\\nsidc\\info"
        if not os.path.exists(file_store):
            os.makedirs(file_store)
            
        self.file = open(file_store + '\\entries.json', 'a')   

Upvotes: 0

Related Questions