Reputation: 1647
I am trying to use Delta lake on Zeppelin running on EMR. Below is my simple bootstrap script, I am using spark-delta 0.0.1 as spark version on EMR is 2.4.4. When I try to create spark session in notebook I below exception.
Fail to execute line 7: spark = configure_spark_with_delta_pip(builder).getOrCreate() Traceback (most recent call last): File "/tmp/zeppelin_pyspark-2848864742877081666.py", line 380, in exec(code, _zcUserQueryNameSpace) File "", line 7, in NameError: name 'configure_spark_with_delta_pip' is not defined
I also tried adding delta-code_2.11-0.6.1.jar
before creating the session but not luck and got below exception (shown in screenshot).
module 'pyspark.sql.utils' has no attribute 'convert_exception'
#!/usr/bin/env bash
# Upgrade pip
python3 -m pip uninstall botocore --user
python3 -m pip uninstall boto3 --user
sudo python3 -m pip install --upgrade pip
python3 -m pip install botocore==1.13.38 --user
sudo yum -y install python3-devel
python3 -m pip install keras==2.2.4 --user
python3 -m pip install numpy==1.19.1 --user
sudo python3 -m pip install pyarrow
sudo python3 -m pip install boto3
sudo python3 -m pip install torch
sudo python3 -m pip install scipy
sudo python3 -m pip install pandas
sudo python3 -m pip install delta-spark==0.0.1
Upvotes: 1
Views: 2003