Steven
Steven

Reputation: 15258

convert output of UDF

On Pyspark, I defined an UDF as follow:

from pyspark.sql.functions import udf
from scipy.spatial.distance import cdist

def closest_point(point, points):
    """ Find closest point from a list of points. """
    return points[cdist([point], points).argmin()]

udf_closest_point = udf(closest_point)

dfC1 = dfC1.withColumn("closest", udf_closest_point(dfC1.point, dfC1.points))

And my data looks like this:

What should I change for my UDF to bring back an array of float instead of a string?

Upvotes: 0

Views: 131

Answers (1)

akuiper
akuiper

Reputation: 214987

You can specify the return type of UDF as array of floats ArrayType(FloatType()):

from pyspark.sql.types import ArrayType, FloatType
udf_closest_point = udf(closest_point, ArrayType(FloatType()))

Upvotes: 1

Related Questions