Shubham Sahay
Shubham Sahay

Reputation: 153

Does the User Defined Functions (UDF) in SPARK works in a distributed way?

Does the User Defined Functions (UDF) in SPARK works in a distributed way if data is stored in different nodes or it accumulates all data into the master node for processing purpose? If it works in a distributed way then can we convert any function in python whether it's pre-defined or user-defined into spark UDF like mentioned below :

spark.udf.register("myFunctionName", functionNewName)

Upvotes: 3

Views: 1869

Answers (1)

Shadowtrooper
Shadowtrooper

Reputation: 1452

Spark dataframe is distributed across the cluster in partitions. Each partition is processed by the UDF, so the answer is yes. You can also see this in Spark UI.

Upvotes: 4

Related Questions