Reputation: 358
I am facing the following error while submitting a python job in a cluster mode:
appcache/application_1548793257188_803870/container_e80_1548793257188_803870_01_000001/environment/lib/python2.7/site-packages/confluent_kafka/init.py", line 2, in from .cimpl import (Consumer, # noqa ImportError: librdkafka.so.1: cannot open shared object file: No such file or directory
librdkafka and other python dependencies are installed ONLY on an edge node. Before submit, I create a virtual environment and pip install confluent-kafka in the following way:
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org --no-binary :all: confluent-kafka
After that, I create environment.tar.gz and pass it to spark-submit with --archives
I have tried to set spark properties like that:
--conf spark.executorEnv.LD_LIBRARY_PATH=/usr/lib64:environment/lib/python2.7/site-packages/confluent_kafka/.libs"
--conf spark.driver.extraLibraryPath=/usr/lib64:environment/lib/python2.7/site-packages/confluent_kafka/.libs"
--conf spark.yarn.appMasterEnv.LD_LIBRARY_PATH=environment/lib/python2.7/site-packages/confluent_kafka/.libs"
But unfortuantly it didnt work!
Somebody faced the same problem?
Upvotes: 0
Views: 427