Reputation: 23
I would like to query, if anyone has an idea, how to get the second lowest value in a row of Dataframe in pyspark.
For example:
Input Dataframe:
Col1 Col2 Col3 Col4
83 32 14 62
63 32 74 55
13 88 6 46
Expected output:
Col1 Col2 Col3 Col4 Res
83 32 14 62 32
63 32 74 55 55
13 88 6 46 13
Upvotes: 2
Views: 2017
Reputation: 32640
You can create an array column using array
function and then sort it using array_sort
. Finally, get the second element using element_at
. These 2 last functions are available from Spark 2.4+.
df.withColumn("res", element_at(array_sort(array(*[col(c) for c in df.columns])), 2))\
.show()
#+----+----+----+----+---+
#|Col1|Col2|Col3|Col4|res|
#+----+----+----+----+---+
#|83 |32 |14 |62 |32 |
#|63 |32 |74 |55 |55 |
#|13 |88 |6 |46 |13 |
#+----+----+----+----+---+
Another way of doing is to use least
function. First, calculate the min value from all columns then calculate another time the least from values that are greater than min
using when
expression:
df.withColumn("min", least(*[col(c) for c in df.columns]))\
.withColumn("res", least(*[when(col(c) > col("min"), col(c)) for c in df.columns]))\
.drop("min")\
.show()
Upvotes: 1
Reputation: 31460
We can use concat_ws
function to concat all columns for the row then use split
to create an array.
use array_sort
function to sort with in the array and extract second element[1]
of the array.
Example:
from pyspark.sql.functions import *
df=spark.createDataFrame([('83','32','14','62'),('63','32','74','55'),('13','88','6','46')],['Col1','Col2','Col3','Col4'])
df.selectExpr("array_sort(split(concat_ws(',',Col1,Col2,Col3,Col4),','))[1] Res").show()
#+---+
#|Res|
#+---+
#|32 |
#|55 |
#|13 |
#+---+
More Dynamic Way:
df.selectExpr("array_sort(split(concat_ws(',',*),','))[1]").show()
#+---+
#|Res|
#+---+
#|32 |
#|55 |
#|13 |
#+---+
EDIT:
#adding Res column to the dataframe
df1=df.selectExpr("*","array_sort(split(concat_ws(',',*),','))[1] Res")
df1.show()
#+----+----+----+----+---+
#|Col1|Col2|Col3|Col4|Res|
#+----+----+----+----+---+
#| 83| 32| 14| 62| 32|
#| 63| 32| 74| 55| 55|
#| 13| 88| 6| 46| 46|
#+----+----+----+----+---+
Upvotes: 2