Reputation: 2909
The same pyspark code works on r7a but not r7g or r8g on a EMR cluster (7.5).
I build the python environment with conda, and use it in pyspark:
conda create -n pyspark python=3.9 --show-channel-urls --channel=conda-forge --override-channels
conda init bash
python -m pip install conda-pack # separate from the req.txt because no hash is given.
conda run -n pyspark python -m pip install -r req.txt
conda pack -n pyspark --output ./pulse-spark-deployment.tar.gz
It use used with the command line (all in one line, split for ease of reading )
bash -c "
PYSPARK_PYTHON=./environment/bin/python
PYTHONPATH=./app
spark-submit
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python
--conf spark.yarn.appMasterEnv.PYTHONPATH=./app
--master yarn
--deploy-mode cluster
--packages
org.apache.spark:spark-avro_2.12:3.5.2,
org.apache.hadoop:hadoop-aws:3.4.0,
org.apache.spark:spark-hadoop-cloud_2.12:3.5.2
--archives
s3://<bucke>/spark/spark-deployment.tar.gz#environment,
s3://<bucket>/spark/spark.zip#app
s3://<bucket>/spark/script.py
"
It works perfectly if I use r7a instances, it fails if I use graviton (r7g or r8g).
The errors I get form yarn are:
User application exited with 126
and
./environment/bin/python: ./environment/bin/python: cannot execute binary file
This is typical from an executable for the wrong architecture, but adding --platform-linux-aarch64
to the conda create line does not change anything.
What could go wrong here?
Upvotes: 0
Views: 67
Reputation: 9877
Make sure you use --platform=linux-aarch64
and not --platform-linux-aarch64
according to the docs.
Running on a Ubuntu 24 x86 host:
~$ conda create -n pyspark_graviton python=3.9 --show-channel-urls --channel=conda-for
ge --override-channels --platform=linux-aarch64
[...]
~$ miniconda3/envs/pyspark_graviton/bin/python3.9 --version
-bash: miniconda3/envs/pyspark_graviton/bin/python3.9: cannot execute binary file: Exec format error
~$ ls -l miniconda3/envs/pyspark_graviton/bin/python3.9
-rwxrwxr-x 1 ubuntu ubuntu 4221904 Dec 30 21:50 miniconda3/envs/pyspark_graviton/bin/python3.9
~$ file miniconda3/envs/pyspark_graviton/bin/python3.9
miniconda3/envs/pyspark_graviton/bin/python3.9: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped
It correctly prepares the arm64
python binary, which fails to run on x86
(as expected).
Alternatively, you can also use a Graviton host to prepare the environment, and don't have to worry about --platform
.
Upvotes: 0