Elisabetta
Elisabetta

Reputation: 358

Confluent-kafka (with kerberos) Error when spark-submit python job in cluster Mode

I am facing the following error while submitting a python job in a cluster mode:

appcache/application_1548793257188_803870/container_e80_1548793257188_803870_01_000001/environment/lib/python2.7/site-packages/confluent_kafka/init.py", line 2, in from .cimpl import (Consumer, # noqa ImportError: librdkafka.so.1: cannot open shared object file: No such file or directory

librdkafka and other python dependencies are installed ONLY on an edge node. Before submit, I create a virtual environment and pip install confluent-kafka in the following way:

pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org --no-binary :all: confluent-kafka

After that, I create environment.tar.gz and pass it to spark-submit with --archives

I have tried to set spark properties like that:

--conf spark.executorEnv.LD_LIBRARY_PATH=/usr/lib64:environment/lib/python2.7/site-packages/confluent_kafka/.libs"
--conf spark.driver.extraLibraryPath=/usr/lib64:environment/lib/python2.7/site-packages/confluent_kafka/.libs"
--conf spark.yarn.appMasterEnv.LD_LIBRARY_PATH=environment/lib/python2.7/site-packages/confluent_kafka/.libs"

But unfortuantly it didnt work!

Somebody faced the same problem?

Upvotes: 0

Views: 427

Answers (0)

Related Questions