Hesam Norin
Hesam Norin

Reputation: 87

Unable to Retrieve Environment Variables in Scrapy's Settings.py for Scrapyd Deployment

I'm new to Scrapy and currently attempting to deploy my spider to a Scrapyd server. However, I'm encountering an issue where I can't seem to use os.getenv within my Scrapy settings file.

This is how I'm attempting to set up my settings.py:

# settings.py
import os
from dotenv import load_dotenv

load_dotenv()

SENTRY_DSN = os.getenv("SENTRY_DSN")
MONGO_URI = os.getenv("MONGO_URI")

In my spider code, I'm trying to access these variables like this:

def get_collection(self) -> Collection:
    client = pymongo.MongoClient(self.settings.get("MONGO_URI"))
    database = client["jobs"]
    collection = database[self.name]
    return collection

I'm using scrapyd-client to deploy my spider to my servers, but it seems that I'm doing something wrong as I can't access these environment variables in my settings file. And this is the full response from the server :

{"node_name": "scrapyd-nd1-c68b9c799-cmwjd", "status": "error", "message": "Traceback (most recent call last):\n  File \"<frozen runpy>\", line 198, in _run_module_as_main\n  File \"<frozen runpy>\", line 88, in _run_code\n  File \"/usr/local/lib/python3.11/dist-packages/scrapyd/runner.py\", line 49, in <module>\n    main()\n  File \"/usr/local/lib/python3.11/dist-packages/scrapyd/runner.py\", line 45, in main\n    execute()\n  File \"/usr/local/lib/python3.11/dist-packages/scrapy/cmdline.py\", line 128, in execute\n    settings = get_project_settings()\n               ^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/dist-packages/scrapy/utils/project.py\", line 71, in get_project_settings\n    settings.setmodule(settings_module_path, priority=\"project\")\n  File \"/usr/local/lib/python3.11/dist-packages/scrapy/settings/__init__.py\", line 383, in setmodule\n    module = import_module(module)\n             ^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.11/importlib/__init__.py\", line 126, in import_module\n    return _bootstrap._gcd_import(name[level:], package, level)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"<frozen importlib._bootstrap>\", line 1206, in _gcd_import\n  File \"<frozen importlib._bootstrap>\", line 1178, in _find_and_load\n  File \"<frozen importlib._bootstrap>\", line 1149, in _find_and_load_unlocked\n  File \"<frozen importlib._bootstrap>\", line 690, in _load_unlocked\n  File \"<frozen importlib._bootstrap_external>\", line 940, in exec_module\n  File \"<frozen importlib._bootstrap>\", line 241, in _call_with_frames_removed\n  File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/jobFlow/settings.py\", line 4, in <module>\n  File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/dotenv/main.py\", line 336, in load_dotenv\n  File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/dotenv/main.py\", line 300, in find_dotenv\n  File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/dotenv/main.py\", line 257, in _walk_to_root\nOSError: Starting path not found\n"}

This is the full command I'm running :

scrapyd-deploy --include-dependencies 

Any ideas how can I solve it?

Upvotes: 0

Views: 426

Answers (1)

Santi1999
Santi1999

Reputation: 33

I think I figured out what is happening. For some reason scrapyd does not like dotenv. If you can refrain from using that library with scrapy you should be able to execute spiders via curl.

In your settings.py remove dotenv and keep import os

import os
#from dotenv import load_dotenv()
#load_dotenv()

import your secrets into your settings.py from your .env like this:

 SENTRY_DSN = os.environ.get('SENTRY_DSN')
 MONGO_URI = os.environ.get('MONGO_URI')

Once you have your settings.py set up with your needed environment variables you are going to use scrapy.utils.project get the values from your settings.py file.

from scrapy.utils.project import get_project_settings
SENTRY_DSN = get_project_settings().get('SENTRY_DSN')
MONGO_URI = get_project_settings().get('MONGO_URI')

Then run scrapyd in the terminal, default should be your project settings variable in the scrapy.cfg file.

scrapyd 
Scrapyd-deploy default  

Doing it this way allowed me to use Scrapyd-deploy default!

curl http://localhost:6800/schedule.json -d project=<project> -d spider=<spider_name>
output: {"node_name": "Name-MBP", "status": "ok", "jobid": "bd605850...."}

This is my error message, its similar to yours if not identical. Hopefully you find this useful or maybe someone else may be experiencing something similar!

{"node_name": "NAME-MBP", "status": "error", "message": "Traceback (most recent call last):\n  File \"<frozen runpy>\", line 198, in _run_module_as_main\n  File \"<frozen runpy>\", line 88, in _run_code\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapyd/runner.py\", line 38, in <module>\n    main()\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapyd/runner.py\", line 34, in main\n    execute()\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/cmdline.py\", line 160, in execute\n    cmd.crawler_process = CrawlerProcess(settings)\n                          ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/crawler.py\", line 357, in __init__\n    super().__init__(settings)\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/crawler.py\", line 227, in __init__\n    self.spider_loader = self._get_spider_loader(settings)\n                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/crawler.py\", line 221, in _get_spider_loader\n    return loader_cls.from_settings(settings.frozencopy())\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/spiderloader.py\", line 79, in from_settings\n    return cls(settings)\n           ^^^^^^^^^^^^^\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/spiderloader.py\", line 34, in __init__\n    self._load_all_spiders()\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/spiderloader.py\", line 63, in _load_all_spiders\n    for module in walk_modules(name):\n                  ^^^^^^^^^^^^^^^^^^\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/utils/misc.py\", line 106, in walk_modules\n    submod = import_module(fullpath)\n             ^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py\", line 126, in import_module\n    return _bootstrap._gcd_import(name[level:], package, level)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"<frozen importlib._bootstrap>\", line 1204, in _gcd_import\n  File \"<frozen importlib._bootstrap>\", line 1176, in _find_and_load\n  File \"<frozen importlib._bootstrap>\", line 1147, in _find_and_load_unlocked\n  File \"<frozen importlib._bootstrap>\", line 690, in _load_unlocked\n  File \"<frozen importlib._bootstrap_external>\", line 940, in exec_module\n  File \"<frozen importlib._bootstrap>\", line 241, in _call_with_frames_removed\n  File \"/Users/name/Coding/proj/legislative/test/test/eggs/proj/num.egg/test/spiders/spider.py\", line 4, in <module>\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/dotenv/main.py\", line 336, in load_dotenv\n    dotenv_path = find_dotenv()\n                  ^^^^^^^^^^^^^\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/dotenv/main.py\", line 300, in find_dotenv\n    for dirname in _walk_to_root(path):\n  File \"/Users/name/Coding/proj/lib/python3.11/site-packages/dotenv/main.py\", line 257, in _walk_to_root\n    raise IOError('Starting path not found')\nOSError: Starting path not found\n"}

Upvotes: 0

Related Questions