Reputation: 93
I use Scrapy to get a website info, and then write the information to a JSON file.
It works correctly started by Scrapy itself, but when I started it by Scrapyd, I found that the JSON file was not created in the same path.
import math
from typing import Any
import scrapy
from scrapy.http import Response
import json
from scrapy.utils.conf import closest_scrapy_cfg
import os
def __init__(self, start_urls=None, *args, **kwargs):
super(NsidcInfoSpider, self).__init__(*args, **kwargs)
# XPath namespaces
self.namespaces = {
...
}
# get the store dir
proj_root = closest_scrapy_cfg()
if proj_root:
proj_root = os.path.dirname(proj_root)
proj_root = proj_root + "\\files\\info"
if not os.path.exists(proj_root):
os.makedirs(proj_root)
self.file = open(proj_root + '\\entries.json', 'a')
def parse(self, response):
# get the website information and parse it to obj
obj = {}
json_string = json.dumps(obj)
self.file.write(json_string + '\n')
def closed(self, reason):
self.file.close()
Upvotes: 0
Views: 37
Reputation: 93
The closest_scrapy_cfg() function in Scrapy is used to locate the nearest scrapy.cfg configuration file. So, when I used scrapyd to dispatch the spider, the JSON file would not be written in the scrapy project dir.
To resolve this, I opted to write the JSON file using the FILES_STORE setting, allowing me to specify an absolute path for the file.
settings = get_project_settings()
file_store = settings.get('FILES_STORE')
if file_store:
file_store = os.path.dirname(file_store)
file_store = file_store + "\\nsidc\\info"
if not os.path.exists(file_store):
os.makedirs(file_store)
self.file = open(file_store + '\\entries.json', 'a')
Upvotes: 0