shreder1921
shreder1921

Reputation: 101

get the minimum column between columns values pyspark

i have a dataframe with x,y,z columns and with 3 X columns and 3 Xd columns and i want to get the minimum Xd column with his X in a new column called id.

df:
x  y  z  a ad b  bd  c  cd
4  8  1  1 2  2  8   3  5
7  5  6  1 6  2  3   3  1
7  3  5  1 9  2  4   3  7

result:
x  y  z  id 
4  8  1  1 
7  5  6  3 
7  3  5  2

Upvotes: 1

Views: 708

Answers (1)

murtihash
murtihash

Reputation: 8410

Try this, using arrays_zip, higher order function filter, and array_min.

from pyspark.sql import functions as F

df.withColumn("zip", F.arrays_zip(F.array('a','b','c'),F.array('ad','bd','cd')))\
  .withColumn("id", F.expr("""filter(zip,x-> x.`1`=array_min(array(ad,bd,cd)))"""))\
  .select("x","y","z", (F.col("id.0")[0]).alias("id")).show()

#+---+---+---+---+
#|  x|  y|  z| id|
#+---+---+---+---+
#|  4|  8|  1|  1|
#|  7|  5|  6|  3|
#|  7|  3|  5|  2|
#+---+---+---+---+

Upvotes: 2

Related Questions