AttributeError: 'NoneType' object has no attribute '_jvm' when passing sql function as a default parameter

Question

I'm aware that this same error has been asked about before, but in this example and this other example, the errors are caused by using pyspark.sql functions in a udf. This is not what I am doing.

The offending code is the following function definition (if I remove the default parameter my code runs and passes all tests).

from pyspark.sql import functions as F

def apply_filter(df, group=F.lit(True)):
    filtered_df = df.filter(group)

I am mainly looking for the reason that this code produces the same error that is found in the other examples.

Edit:

I cannot share the original code due to work, but if you run the previous code with spark-submit --deploy-mode cluster the following error is produced.

LogType:stdout
Log Upload Time:Fri Mar 09 16:01:45 +0000 2018
LogLength:343
Log Contents:
Traceback (most recent call last):
  File "temp.py", line 3, in 
    def apply_filter(df, group=F.lit(True)):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1520603520946_0005/container_1520603520946_0005_01_000001/pyspark.zip/pyspark/sql/functions.py", line 40, in _
AttributeError: 'NoneType' object has no attribute '_jvm'
End of LogType:stdout

Interestingly enough, the error doesn't persist if it is ran locally.

AttributeError: 'NoneType' object has no attribute '_jvm' when passing sql function as a default parameter

Answers (1)

Related Questions

AttributeError: &#39;NoneType&#39; object has no attribute &#39;_jvm&#39; when passing sql function as a default parameter

Answers (1)

Related Questions

AttributeError: 'NoneType' object has no attribute '_jvm' when passing sql function as a default parameter