Reputation: 484
I have a scrapy+Selenium spider packaged in a docker container. I want to run that container with passing some aruments to the spider. However, for some reason I receive a strange error message. I did an extensive search and tried many different options before submitting the question.
Dockerfile
FROM python:2.7
# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
# install xvfb
RUN apt-get install -yqq xvfb
# install pyvirtualdisplay
RUN pip install pyvirtualdisplay
# set display port and dbus env to avoid hanging
ENV DISPLAY=:99
ENV DBUS_SESSION_BUS_ADDRESS=/dev/null
#install scrapy
RUN pip install --upgrade pip && \
pip install --upgrade \
setuptools \
wheel && \
pip install --upgrade scrapy
# install selenium
RUN pip install selenium==3.8.0
# install xlrd
RUN pip install xlrd
# install bs4
RUN pip install beautifulsoup4
ADD . /tralala/
WORKDIR tralala/
CMD scrapy crawl personel_spider_mpc -a chunksNo=$chunksNo -a chunkI=$chunkI
I guess that the problem may be in CMD part.
Spider init part:
class Crawler(scrapy.Spider):
name = "personel_spider_mpc"
allowed_domains = ['tralala.de',]
def __init__(self, vdisplay = True, **kwargs):
super(Crawler, self).__init__(**kwargs)
self.chunkI = chunkI
self.chunksNo = chunksNo
How I run the container:
docker run --env chunksNo='10' --env chunkI='1' ostapp/tralala
I tried with both quotations marks and without them
The error message:
2018-04-04 16:42:32 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
spider = cls(*args, **kwargs)
File "/tralala/tralala/spiders/tralala_spider_mpc.py", line 673, in __init__
self.chunkI = chunkI
NameError: global name 'chunkI' is not defined
Upvotes: 1
Views: 837
Reputation: 487
Your arguments are stored in kwargs
, which is just a dictionary, with key acting as argument name and value as argument value. It does not define names for you, so you get your error.
For more details, see this answer
In your specific case, try self.chunkI = kwargs['chunkI']
and self.chunksNo = kwargs['chunksNo']
Upvotes: 1