Reputation: 1296
Why does HttpCachedMiddleware need scrapy.cfg and how do I work around this issue?
I use scrapyd-deploy
to build the egg, and deploy project to scrapyd.
When the job is run, I see from the log output that the HttpCacheMiddleware is disabled because scrapy.cfg is not found.
2014-06-08 18:55:51-0700 [scrapy] WARNING: Disabled HttpCacheMiddleware: Unable to find scrapy.cfg file to infer project data dir
I check the egg file and scrapy.cfg is indeed not there because the egg file only consists of the files in the project directory. I could be wrong, but I think the egg is built correctly.
foo/
|- project/
| |- __init__.py
| |- settings.py
| |- spiders/
| |- ...
|- scrapy.cfg
Digging into the code more, I think one of the three if-condition is failing somehow in MiddlewareManager.
try:
mwcls = load_object(clspath)
if crawler and hasattr(mwcls, 'from_crawler'):
mw = mwcls.from_crawler(crawler)
elif hasattr(mwcls, 'from_settings'):
mw = mwcls.from_settings(settings)
else:
mw = mwcls()
middlewares.append(mw)
except NotConfigured, e:
if e.args:
clsname = clspath.split('.')[-1]
log.msg(format="Disabled %(clsname)s: %(eargs)s",
level=log.WARNING, clsname=clsname, eargs=e.args[0])
Upvotes: 3
Views: 733
Reputation: 440
Place an empty scrapy.cfg
under your working directory.
As the source code shows, project_data_dir
will try to find the closest scrapy.cfg
and use it to infer the project data dir.
Upvotes: 1