Shubham Sahay
Shubham Sahay

Reputation: 153

Does the User Defined Functions (UDF) in SPARK works in a distributed way?

Does the User Defined Functions (UDF) in SPARK works in a distributed way if data is stored in different nodes or it accumulates all data into the master node for processing purpose? If it works in a distributed way then can we convert any function in python whether it's pre-defined or user-defined into spark UDF like mentioned below :

spark.udf.register("myFunctionName", functionNewName)

Upvotes: 3

Views: 1884

Answers (1)

Shadowtrooper
Shadowtrooper

Reputation: 1462

Spark dataframe is distributed across the cluster in partitions. Each partition is processed by the UDF, so the answer is yes. You can also see this in Spark UI.

Upvotes: 4

Related Questions