Johanna
Johanna

Reputation: 165

How to use udf functions in pyspark

I am analysing the following piece of code:

from pyspark.sql.functions import udf,col, desc    
def error(value, pred):
    return abs(value - pred)

udf_MAE = udf(lambda value, pred: MAE(value= value, pred = pred), FloatType())

I know an udf is an user defined function, but I don't understand what that means? Because udfwasn't define anywhere previously on the code?

Upvotes: 1

Views: 2727

Answers (1)

Israel Phiri
Israel Phiri

Reputation: 139

User Defined Functions (UDFs) are useful when you need to define logic specific to your use case and when you need to encapsulate that solution for reuse. They should only be used when there is no clear way to accomplish a task using built-in functions..Azure DataBricks

Create your function (after you have made sure there is no built in function to perform similar task)

def greatingFunc(name):
  return 'hello {name}!'

Then you need to register your function as a UDF by designating the following:

A name for access in Python (myGreatingUDF)

The function itself (greatingFunc)

The return type for the function (StringType)

myGreatingUDF = spark.udf.register("myGreatingUDF",greatingFunc,StringType())

Now you can call you UDF anytime you need it,

guest = 'John'
print(myGreatingUDF(guest))

Upvotes: 2

Related Questions