Reputation: 213
I found a similiar situation with mine in this line, but he is using SQL server, not pyspark/python: Pivoting Multiple Columns Based On a Single Column
I have a dateset as below:
ID Date Class
1 2021/01/01 math, english
1 2021/01/02 math, english
1 2021/01/03 chinese
1 2021/01/04 math, chemistry
1 2021/01/05 math, english
1 2021/01/06 Chinese
2 2021/01/01 PE
2 2021/01/02 math, chinese
2 2021/01/03 math, english
2 2021/01/04 math, chinese
.......
the desire output should be:
ID Date_1 schedule_1 Date_2 schedule_2 Date_3 schedule_3
1 2021/01/01 math, english 2021/01/03 chinese 2021/01/05 math, chemistry...
1 2021/01/02 math, english 2021/01/06 chinese...
1 2021/01/05 math, english....
2 2021/01/01 PE 2021/01/02 math, chinese 2021/01/03 math, english
2 2021/01/04 math, chinese
I am planning using pivot and groupby, this is my current code, which is not working, and I have no idea on how to solve it.
line2 = line\
.select("ID")\
.groupBy("ID","Class")\
.pivot("Date","Class")\
.agg(expr("coalesce(first(Class), \" \")"))
Any help or ideas or suggestions will be appreciated.
Upvotes: 0
Views: 242
Reputation: 42422
A bit tricky and need some more wrangling:
import pyspark.sql.functions as F
from pyspark.sql import Window
df2 = df.withColumn(
'rn',
F.row_number().over(Window.partitionBy('ID', 'Class').orderBy('Date'))
).withColumn(
'mindate',
F.min('Date').over(Window.partitionBy('ID', 'Class'))
).withColumn(
'rn2',
F.dense_rank().over(Window.partitionBy('ID').orderBy('mindate'))
).groupBy('ID', 'rn').pivot('rn2').agg(
F.first(F.struct('Date', 'Class'))
).orderBy('ID', 'rn')
df3 = df2.select(
'ID',
*[f'{c}.*' for c in df2.columns[2:]]
)
df3.show(truncate=False)
+---+----------+-------------+----------+-------------+----------+---------------+
|ID |Date |Class |Date |Class |Date |Class |
+---+----------+-------------+----------+-------------+----------+---------------+
|1 |2021/01/01|math, english|2021/01/03|chinese |2021/01/04|math, chemistry|
|1 |2021/01/02|math, english|2021/01/06|chinese |null |null |
|1 |2021/01/05|math, english|null |null |null |null |
|2 |2021/01/01|PE |2021/01/02|math, chinese|2021/01/03|math, english |
|2 |null |null |2021/01/04|math, chinese|null |null |
+---+----------+-------------+----------+-------------+----------+---------------+
Upvotes: 1