user2523848
user2523848

Reputation: 345

Invalid Return Type in pyspark for UDF

I am facing a strange issue in pyspark where I want to define and use a UDF. I am always getting this error:

TypeError: Invalid returnType: returnType should be DataType or str but is <'pyspark.sql.types.IntegerType'>

My code is actually very simple:

from pyspark.sql import SparkSession
from pyspark.sql.types import IntegerType

def square(x):
    return 2

def _process():
    spark = SparkSession.builder.master("local").appName('process').getOrCreate()
    spark_udf = udf(square,IntegerType)

The problem is probably with the IntegerType but I don't know what is wrong with that. I am using Python version 3.5.3 and the spark version 2.4.1

Upvotes: 3

Views: 4116

Answers (1)

Shubham Jain
Shubham Jain

Reputation: 5526

Since you are using IntegerType directly without calling it is causing issue

def _process():
    spark = SparkSession.builder.master("local").appName('process').getOrCreate()
    spark_udf = udf(square,IntegerType())

Try by calling the type IntegerType() and it should work fine.

Upvotes: 5

Related Questions