Reputation: 101
i have a dataframe with x,y,z columns and with 3 X columns and 3 Xd columns and i want to get the minimum Xd column with his X in a new column called id.
df:
x y z a ad b bd c cd
4 8 1 1 2 2 8 3 5
7 5 6 1 6 2 3 3 1
7 3 5 1 9 2 4 3 7
result:
x y z id
4 8 1 1
7 5 6 3
7 3 5 2
Upvotes: 1
Views: 708
Reputation: 8410
Try this, using arrays_zip
, higher order function filter
, and array_min
.
from pyspark.sql import functions as F
df.withColumn("zip", F.arrays_zip(F.array('a','b','c'),F.array('ad','bd','cd')))\
.withColumn("id", F.expr("""filter(zip,x-> x.`1`=array_min(array(ad,bd,cd)))"""))\
.select("x","y","z", (F.col("id.0")[0]).alias("id")).show()
#+---+---+---+---+
#| x| y| z| id|
#+---+---+---+---+
#| 4| 8| 1| 1|
#| 7| 5| 6| 3|
#| 7| 3| 5| 2|
#+---+---+---+---+
Upvotes: 2