T.SURESH ARUNACHALAM
T.SURESH ARUNACHALAM

Reputation: 285

I want to get min value of the column in PySpark dataframe

I want to create a new column with the min value of compare_at_price. If data contains value we can easily get the min value by sumList1 = udf(lambda c: min(c), IntegerType()). But I am having comma alone in some rows in the dataframe.

+--------------------+

|    compare_at_price|

+--------------------+

|               [,,,]|

|                  []|

|               [,,,]|

|[89.95, 89.95, 89.95|

|                  []|

|                  []|

Can you please help me to solve!

Upvotes: 0

Views: 7168

Answers (2)

Yayati Sule
Yayati Sule

Reputation: 1631

You can find the minimum of the ArrayType columns in teh following way:

from pyspark.sql.functions import col,array_min

resultDF = df.select(array_min(col("compare_at_price")).alias('min_price'))

resultDF.show(False)

Upvotes: 2

sudomudo
sudomudo

Reputation: 94

Firstly import

import org.apache.spark.sql.functions.{min, max}

And then,

df.agg(min("compare_at_price")).show()

Upvotes: 2

Related Questions