Reputation: 153
Does the User Defined Functions (UDF) in SPARK works in a distributed way if data is stored in different nodes or it accumulates all data into the master node for processing purpose? If it works in a distributed way then can we convert any function in python whether it's pre-defined or user-defined into spark UDF like mentioned below :
spark.udf.register("myFunctionName", functionNewName)
Upvotes: 3
Views: 1869
Reputation: 1452
Spark dataframe is distributed across the cluster in partitions. Each partition is processed by the UDF, so the answer is yes. You can also see this in Spark UI.
Upvotes: 4