Reputation: 329
I have a very simple Glue ETL Job with the following code:
from pyspark.context import SparkContext
sc = SparkContext.getOrCreate()
conf = sc.getConf()
print(conf.toDebugString())
The Job is created with a Redshift connection enabled. When executing the Job I get:
No module named pyspark.context
The public documentations all seem to mention, point, and imply the availability of pyspark, but why is my environment complaining that it doesn't have pyspark? What steps am I missing?
Best Regards, Lim
Upvotes: 1
Views: 3711
Reputation: 589
Python Shell jobs only support Python and libraries like pandas, Scikit-learn, etc. They don't have support for PySpark, so you should create one with job type = Spark and ETL language = Python in order to make it work.
Upvotes: 5
Reputation: 515
I use:
from pyspark.context import SparkContext
from awsglue.context import GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
Upvotes: 1