Explorer
Explorer

Reputation: 1647

Delta Lake on EMR and Zeppelin gives 'configure_spark_with_delta_pip' is not defined

I am trying to use Delta lake on Zeppelin running on EMR. Below is my simple bootstrap script, I am using spark-delta 0.0.1 as spark version on EMR is 2.4.4. When I try to create spark session in notebook I below exception.

Fail to execute line 7: spark = configure_spark_with_delta_pip(builder).getOrCreate() Traceback (most recent call last): File "/tmp/zeppelin_pyspark-2848864742877081666.py", line 380, in exec(code, _zcUserQueryNameSpace) File "", line 7, in NameError: name 'configure_spark_with_delta_pip' is not defined

I also tried adding delta-code_2.11-0.6.1.jar before creating the session but not luck and got below exception (shown in screenshot).

module 'pyspark.sql.utils' has no attribute 'convert_exception'

#!/usr/bin/env bash

# Upgrade pip

python3 -m pip uninstall botocore --user
python3 -m pip uninstall boto3 --user


sudo python3 -m pip install --upgrade pip

python3 -m pip install botocore==1.13.38 --user

sudo yum -y install python3-devel

python3 -m pip install keras==2.2.4 --user

python3 -m pip install numpy==1.19.1 --user


sudo python3 -m pip install pyarrow
sudo python3 -m pip install boto3
sudo python3 -m pip install torch
sudo python3 -m pip install scipy 
sudo python3 -m pip install pandas
sudo python3 -m pip install delta-spark==0.0.1

When I add delta-code jar as addPyFile

Upvotes: 1

Views: 2003

Answers (1)

zsxwing
zsxwing

Reputation: 20816

0.0.1 is a test version to verify pip release process. It's not a real release. You should use 1.0.0 or above based on your Spark version. You can find the compatibility matrix in the doc.

sudo python3 -m pip install delta-spark==1.0.0

Upvotes: 2

Related Questions