Reputation: 2115
I'm trying to submit Apache Spark driver program to the remote cluster. I'm having difficulties with the python package called mysql
. I installed this package on all Spark nodes. Cluster is running inside docker-compose, images are based on bde2020.
$ docker-compose logs impressions-agg
impressions-agg_1 | Submit application /app/app.py to Spark master spark://spark-master:7077
impressions-agg_1 | Passing arguments
impressions-agg_1 | 19/11/13 18:45:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
impressions-agg_1 | Traceback (most recent call last):
impressions-agg_1 | File "/app/app.py", line 6, in <module>
impressions-agg_1 | from mysql.connector import connect
impressions-agg_1 | ModuleNotFoundError: No module named 'mysql'
impressions-agg_1 | log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
impressions-agg_1 | log4j:WARN Please initialize the log4j system properly.
impressions-agg_1 | log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Module mysql
is installed via pip on all nodes.
$ docker-compose exec spark-master pip list
Package Version
--------------- -------------------
mysql-connector 2.2.9
pip 18.1
setuptools 40.8.0.post20190503
$ docker-compose exec spark-worker pip list
Package Version
--------------- -------------------
mysql-connector 2.2.9
pip 18.1
setuptools 40.8.0.post20190503
How can I solve this? Thank you for any information.
Upvotes: 0
Views: 706
Reputation: 5421
While the node has mysql installed, the container does not. What the logs are telling you is that impressions-agg_1
contains a script at /app/app.py
which is trying to load mysql but cannot find it.
Did you create impressions-agg_1
? Add a RUN pip install mysql
step to its Dockerfile.
Upvotes: 1