How to dynamically transpose a single Column to multiple Rows in pyspark?

Question

I have a dataframe that looks like below

ColName
a
b
c
d
e
f
g
h
i
j
k
l

and based on an specific parameter I want to transpose those values into rows. So for example if the parameter value is 3, the new dataframe will look like below

Col1	Col2	Col3
a	b	c
d	e	f
g	h	i
j	k	l

However if the parameter value is 4, it will look like below

Col1	Col2	Col3	Col4
a	b	c	d
e	f	g	h
i	j	k	l

A few things to notice:

The column names are not important
Both the number of items in that single column and the parameter can change

Any idea how to achieve this in pyspark? Thanks in advance.

mck · Accepted Answer

You can add some helper columns to pivot the dataframe:

import pyspark.sql.functions as F

x = 3

result = df.withColumn(
    'id',
    F.monotonically_increasing_id()
).withColumn(
    'id2',
    (F.col('id') / x).cast('int')
).withColumn(
    'id3',
    F.col('id') % x
).groupBy('id2').pivot('id3').agg(F.first('ColName')).orderBy('id2').drop('id2')

result.show()
+---+---+---+
|  0|  1|  2|
+---+---+---+
|  a|  b|  c|
|  d|  e|  f|
|  g|  h|  i|
|  j|  k|  l|
+---+---+---+

How to dynamically transpose a single Column to multiple Rows in pyspark?

Answers (2)

Related Questions