Reputation: 345
I am facing a strange issue in pyspark where I want to define and use a UDF. I am always getting this error:
TypeError: Invalid returnType: returnType should be DataType or str but is <'pyspark.sql.types.IntegerType'>
My code is actually very simple:
from pyspark.sql import SparkSession
from pyspark.sql.types import IntegerType
def square(x):
return 2
def _process():
spark = SparkSession.builder.master("local").appName('process').getOrCreate()
spark_udf = udf(square,IntegerType)
The problem is probably with the IntegerType but I don't know what is wrong with that. I am using Python version 3.5.3
and the spark version 2.4.1
Upvotes: 3
Views: 4116
Reputation: 5526
Since you are using IntegerType
directly without calling it is causing issue
def _process():
spark = SparkSession.builder.master("local").appName('process').getOrCreate()
spark_udf = udf(square,IntegerType())
Try by calling the type IntegerType()
and it should work fine.
Upvotes: 5