Lim
Lim

Reputation: 329

How to enable pySpark in Glue ETL?

I have a very simple Glue ETL Job with the following code:

from pyspark.context import SparkContext

sc = SparkContext.getOrCreate()
conf = sc.getConf()

print(conf.toDebugString())

The Job is created with a Redshift connection enabled. When executing the Job I get:

No module named pyspark.context

The public documentations all seem to mention, point, and imply the availability of pyspark, but why is my environment complaining that it doesn't have pyspark? What steps am I missing?

Best Regards, Lim

Upvotes: 1

Views: 3711

Answers (2)

Aida Martinez
Aida Martinez

Reputation: 589

Python Shell jobs only support Python and libraries like pandas, Scikit-learn, etc. They don't have support for PySpark, so you should create one with job type = Spark and ETL language = Python in order to make it work.

Upvotes: 5

Gianmar
Gianmar

Reputation: 515

I use:

from pyspark.context import SparkContext
from awsglue.context import GlueContext

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

Upvotes: 1

Related Questions