Reputation: 195
I have a spark df as follows:
p a b c
p1 2 2 1
p2 4 3 2
I want to transpose it to below format using PySpark code:
p col1 col2
p1 a 2
p1 b 2
p1 c 1
p2 a 4
p2 b 3
p2 c 2
How to?
Upvotes: 1
Views: 73
Reputation: 31540
Try with arrays_zip and explode
functions.
Example:
df.show()
#+---+---+---+---+
#| p| a| b| c|
#+---+---+---+---+
#| p1| 2| 2| 1|
#| p2| 4| 3| 2|
#+---+---+---+---+
df.withColumn("arr",explode(arrays_zip(array(lit("a"),lit("b"),lit("c")),array(col("a"),col("b"),col("c"))))).\
select("p","arr.*").\
withColumnRenamed("0","col1").\
withColumnRenamed("1","col2").\
show()
#dynamically getting column names from dataframe
arr=[ lit('{}'.format(d)) for d in df.columns if d !='p']
df.withColumn("arr",explode(arrays_zip(array(arr),array(col("a"),col("b"),col("c"))))).select("p","arr.*").\
withColumnRenamed("0","col1").\
withColumnRenamed("1","col2").\
show()
#+---+----+----+
#| p|col1|col2|
#+---+----+----+
#| p1| a| 2|
#| p1| b| 2|
#| p1| c| 1|
#| p2| a| 4|
#| p2| b| 3|
#| p2| c| 2|
#+---+----+----+
Upvotes: 1