williamscathy825
williamscathy825

Reputation: 195

How to transpose a spark DF using python (PySpark)?

I have a spark df as follows:

p   a b c 
p1  2 2 1
p2  4 3 2

I want to transpose it to below format using PySpark code:

p  col1 col2
p1  a     2
p1  b     2
p1  c     1
p2  a     4
p2  b     3
p2  c     2

How to?

Upvotes: 1

Views: 73

Answers (1)

notNull
notNull

Reputation: 31540

Try with arrays_zip and explode functions.

Example:

df.show()
#+---+---+---+---+
#|  p|  a|  b|  c|
#+---+---+---+---+
#| p1|  2|  2|  1|
#| p2|  4|  3|  2|
#+---+---+---+---+

df.withColumn("arr",explode(arrays_zip(array(lit("a"),lit("b"),lit("c")),array(col("a"),col("b"),col("c"))))).\
select("p","arr.*").\
withColumnRenamed("0","col1").\
withColumnRenamed("1","col2").\
show()

#dynamically getting column names from dataframe
arr=[ lit('{}'.format(d)) for d in df.columns if d !='p']

df.withColumn("arr",explode(arrays_zip(array(arr),array(col("a"),col("b"),col("c"))))).select("p","arr.*").\
withColumnRenamed("0","col1").\
withColumnRenamed("1","col2").\
show()

#+---+----+----+
#|  p|col1|col2|
#+---+----+----+
#| p1|   a|   2|
#| p1|   b|   2|
#| p1|   c|   1|
#| p2|   a|   4|
#| p2|   b|   3|
#| p2|   c|   2|
#+---+----+----+

Upvotes: 1

Related Questions