Reputation: 151
For Spark dataframe via pyspark, we can use pyspark.sql.functions.udf
to create a user defined function (UDF)
.
I wonder if I can use any function from Python packages in udf()
, e.g., np.random.normal
from numpy?
Upvotes: 8
Views: 10195
Reputation: 5433
Assuming you want to add a column named new
to your DataFrame df
constructed by calling numpy.random.normal
repeatedly, you could do:
import numpy
from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.types import DoubleType
udf = UserDefinedFunction(numpy.random.normal, DoubleType())
df_with_new_column = df.withColumn('new', udf())
Upvotes: 13