Todd Owen
Todd Owen

Reputation: 16228

Using LD_PRELOAD with Apache Spark (or YARN)

We are running Spark jobs on Apache Hadoop YARN. I have a special need to use the "LD_PRELOAD trick" on these jobs. (Before anyone panics, it's not for production runs; this is part of automated job testing).

I know how to submit additional files with the job, and I know how to set environment variables on the nodes, so adding these settings to spark-defaults.conf almost provides a solution:

spark.files=/home/todd/pwn_connect.so
spark.yarn.appMasterEnv.LD_PRELOAD=pwn_connect.so
spark.executorEnv.LD_PRELOAD=pwn_connect.so

But I get this error in the container logs:

ERROR: ld.so: object 'pwn_connect.so' from LD_PRELOAD cannot be preloaded: ignored.

The problem seems to be that LD_PRELOAD doesn't accept the relative path that I'm providing. But I don't know how to provide an absolute path -- I don't have a clue where on the local filesystem of the nodes these files are being placed.

Upvotes: 2

Views: 1111

Answers (2)

sam
sam

Reputation: 1

I have had a similar problem for a year and half,tried several ways, did not worked; until I saw this comment. Thank you.

--conf spark.yarn.dist.files=/usr/lib64/libopenblas64.so \
--conf spark.yarn.appMasterEnv.LD_PRELOAD=./libopenblas64.so \
--conf spark.executorEnv.LD_PRELOAD=./libopenblas64.so \

Upvotes: 0

Todd Owen
Todd Owen

Reputation: 16228

Firstly, spark.files is not used when running on YARN, it should be spark.yarn.dist.files. And note that this will be overwritten if the --files argument is provided to spark-submit.

For LD_PRELOAD, there are two solutions that will work:

  1. Relative paths can be used; they need to be prefixed with ./:

    spark.yarn.dist.files=/home/todd/pwn_connect.so
    spark.yarn.appMasterEnv.LD_PRELOAD=./pwn_connect.so
    spark.executorEnv.LD_PRELOAD=./pwn_connect.so
    

    (relative paths without ./ are searched for in LD_LIBRARY_PATH, rather than the current working directory).

  2. If an absolute path is preferred, examining the Spark source code reveals that the whole command line including environment variable assignments are subject to expansion by the shell, so the expression $PWD will be expanded to the current working directory:

    spark.yarn.dist.files=/home/todd/pwn_connect.so
    spark.yarn.appMasterEnv.LD_PRELOAD=$PWD/pwn_connect.so
    spark.executorEnv.LD_PRELOAD=$PWD/pwn_connect.so
    

Upvotes: 1

Related Questions