Reputation: 191
I am getting the following error for the code below, please help:
from delta.tables import *
ModuleNotFoundError: No module named 'delta.tables'
INFO SparkContext: Invoking stop() from shutdown hook
Here is the code: '''
from pyspark.sql import *
if __name__ == "__main__":
spark = SparkSession \
.builder \
.appName("DeltaLake") \
.config("spark.jars", "delta-core_2.12-0.7.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.getOrCreate()
from delta.tables import *
data = spark.range(0, 5)
data.printSchema()
'''
An online search suggesting verifying the scala version to delta core jar version. Here is the scala & Jar versions
"delta-core_2.12-0.7.0"
"Using Scala version 2.12.10, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_221"
Upvotes: 11
Views: 20678
Reputation: 816
According to delta package documentation, there is a python file named tables
.
You should clone the repository and copy the delta
folder under python/delta
to your site packages path (i.e. ..\python37\Lib\site-packages
). then restart python and your code runs without the error.
I am using Python3.5.3
,
pyspark==3.0.1
,
Upvotes: 5
Reputation: 406
There is a difference between spark.jars
and spark.jars.packages
. Since you are following the Quick Start, try replacing
.config("spark.jars", "delta-core_2.12-0.7.0")
with
.config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0")
Upvotes: 4