Joshua Howard
Joshua Howard

Reputation: 916

AttributeError: 'NoneType' object has no attribute '_jvm' when passing sql function as a default parameter

I'm aware that this same error has been asked about before, but in this example and this other example, the errors are caused by using pyspark.sql functions in a udf. This is not what I am doing.

The offending code is the following function definition (if I remove the default parameter my code runs and passes all tests).

from pyspark.sql import functions as F

def apply_filter(df, group=F.lit(True)):
    filtered_df = df.filter(group)

I am mainly looking for the reason that this code produces the same error that is found in the other examples.

Edit:

I cannot share the original code due to work, but if you run the previous code with spark-submit --deploy-mode cluster <filename> the following error is produced.

LogType:stdout
Log Upload Time:Fri Mar 09 16:01:45 +0000 2018
LogLength:343
Log Contents:
Traceback (most recent call last):
  File "temp.py", line 3, in <module>
    def apply_filter(df, group=F.lit(True)):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1520603520946_0005/container_1520603520946_0005_01_000001/pyspark.zip/pyspark/sql/functions.py", line 40, in _
AttributeError: 'NoneType' object has no attribute '_jvm'
End of LogType:stdout

Interestingly enough, the error doesn't persist if it is ran locally.

Upvotes: 0

Views: 1706

Answers (1)

MaFF
MaFF

Reputation: 10076

This error occurs when the spark context cannot be instantiated. When you use a pyspark sql function within a UDF you're trying to instantiate a spark context within it which is not allowed.

There can be several reason why the spark context cannot be instantiated

  • bad spark configuration
  • conflicting node configurations
  • deploying a jar that is not in adequation with the cluster configurations
  • ...

Upvotes: 1

Related Questions