ashim
ashim

Reputation: 25580

PySpark minimum of a list

How to find a minimum of a list that is stored in a cell? I can do a udf, but it feels like an overkill. The min function from pyspark.sql.functions works only on groups (that is the result of groupBy).

min_ = udf(lambda inarr: min(inarr), IntegerType())
myDataFrameWithMin = myDataFrame.withColumn('min_value', min_(F.col('position_list')))

Upvotes: 1

Views: 1436

Answers (2)

Evan Zamir
Evan Zamir

Reputation: 8481

Just sort and then take the first value/row.

df.sort(col, ascending=True)

Upvotes: 0

Mariusz
Mariusz

Reputation: 13946

If you imported pyspark.sql.functions and python's min is covered, you can still access it with __builtins__ prefix, for example:

min_ = udf(lambda inarr: __builtins__.min(inarr), IntegerType())

Upvotes: 1

Related Questions