Reputation: 916
I'm aware that this same error has been asked about before, but in this example and this other example, the errors are caused by using pyspark.sql
functions in a udf
. This is not what I am doing.
The offending code is the following function definition (if I remove the default parameter my code runs and passes all tests).
from pyspark.sql import functions as F
def apply_filter(df, group=F.lit(True)):
filtered_df = df.filter(group)
I am mainly looking for the reason that this code produces the same error that is found in the other examples.
Edit:
I cannot share the original code due to work, but if you run the previous code with spark-submit --deploy-mode cluster <filename>
the following error is produced.
LogType:stdout
Log Upload Time:Fri Mar 09 16:01:45 +0000 2018
LogLength:343
Log Contents:
Traceback (most recent call last):
File "temp.py", line 3, in <module>
def apply_filter(df, group=F.lit(True)):
File "/mnt/yarn/usercache/hadoop/appcache/application_1520603520946_0005/container_1520603520946_0005_01_000001/pyspark.zip/pyspark/sql/functions.py", line 40, in _
AttributeError: 'NoneType' object has no attribute '_jvm'
End of LogType:stdout
Interestingly enough, the error doesn't persist if it is ran locally.
Upvotes: 0
Views: 1706
Reputation: 10076
This error occurs when the spark context cannot be instantiated. When you use a pyspark sql function within a UDF
you're trying to instantiate a spark context within it which is not allowed.
There can be several reason why the spark context cannot be instantiated
Upvotes: 1