Jatin
Jatin

Reputation: 113

Glue not able to recognize Delta Lake Python Library

I am trying to use Delta Lake Python Library in my Glue job. However, my Glue job is not able to recognize it and I get the error "NameError: name 'DeltaTable' is not defined". Per Glue-DeltaLake documentation , I added the paramter --datalake-formats = delta and also updated the required spark configuration

.config("spark.sql.extensions","io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.delta.catalog.DeltaCatalog")

My code fails at below line

deltaTable = DeltaTable.forPath(self.spark,self.dest_path_sdad)

Any ideas?

Upvotes: 0

Views: 1119

Answers (2)

Jatin
Jatin

Reputation: 113

I was missing the import statement

from delta.tables import *

Upvotes: 0

Alex Ott
Alex Ott

Reputation: 87069

These configuration properties configure Glue with the Delta Lake file format, so you can write spark.read.format("delta").load(...) or df.write.format("delta").save(...). But they doesn't provide the Python API that is available as the delta-spark package. It could be made available to Glue by using the --additional-python-modules option (doc).

Upvotes: 0

Related Questions