I want to get min value of the column in PySpark dataframe

Question

I want to create a new column with the min value of compare_at_price. If data contains value we can easily get the min value by sumList1 = udf(lambda c: min(c), IntegerType()). But I am having comma alone in some rows in the dataframe.

+--------------------+

|    compare_at_price|

+--------------------+

|               [,,,]|

|                  []|

|               [,,,]|

|[89.95, 89.95, 89.95|

|                  []|

|                  []|

Can you please help me to solve!

Yayati Sule · Accepted Answer

You can find the minimum of the ArrayType columns in teh following way:

from pyspark.sql.functions import col,array_min

resultDF = df.select(array_min(col("compare_at_price")).alias('min_price'))

resultDF.show(False)

I want to get min value of the column in PySpark dataframe

Answers (2)

Related Questions