Reputation: 15258
On Pyspark, I defined an UDF as follow:
from pyspark.sql.functions import udf
from scipy.spatial.distance import cdist
def closest_point(point, points):
""" Find closest point from a list of points. """
return points[cdist([point], points).argmin()]
udf_closest_point = udf(closest_point)
dfC1 = dfC1.withColumn("closest", udf_closest_point(dfC1.point, dfC1.points))
And my data looks like this:
What should I change for my UDF to bring back an array of float instead of a string?
Upvotes: 0
Views: 131
Reputation: 214987
You can specify the return type of UDF as array of floats ArrayType(FloatType())
:
from pyspark.sql.types import ArrayType, FloatType
udf_closest_point = udf(closest_point, ArrayType(FloatType()))
Upvotes: 1