Reputation: 285
I want to create a new column with the min value of compare_at_price. If data contains value we can easily get the min value by
sumList1 = udf(lambda c: min(c), IntegerType())
. But I am having comma alone in some rows in the dataframe.
+--------------------+
| compare_at_price|
+--------------------+
| [,,,]|
| []|
| [,,,]|
|[89.95, 89.95, 89.95|
| []|
| []|
Can you please help me to solve!
Upvotes: 0
Views: 7168
Reputation: 1631
You can find the minimum of the ArrayType columns in teh following way:
from pyspark.sql.functions import col,array_min
resultDF = df.select(array_min(col("compare_at_price")).alias('min_price'))
resultDF.show(False)
Upvotes: 2
Reputation: 94
Firstly import
import org.apache.spark.sql.functions.{min, max}
And then,
df.agg(min("compare_at_price")).show()
Upvotes: 2