Reputation: 25580
How to find a minimum of a list that is stored in a cell?
I can do a udf, but it feels like an overkill. The min
function from pyspark.sql.functions
works only on groups (that is the result of groupBy).
min_ = udf(lambda inarr: min(inarr), IntegerType())
myDataFrameWithMin = myDataFrame.withColumn('min_value', min_(F.col('position_list')))
Upvotes: 1
Views: 1436
Reputation: 8481
Just sort and then take the first value/row.
df.sort(col, ascending=True)
Upvotes: 0
Reputation: 13946
If you imported pyspark.sql.functions
and python's min
is covered, you can still access it with __builtins__
prefix, for example:
min_ = udf(lambda inarr: __builtins__.min(inarr), IntegerType())
Upvotes: 1