pivot in PYSPARKSQL

Question

i have a requirement to use pivot on below table.

id,week,score
102,1,96
101,1,138
102,1,37
101,1,59
101,2,282
102,2,212
102,2,78
101,2,97
102,3,60
102,3,123
101,3,220
101,3,87

output

      1         2         3

101   138,59   282,97     220,87
102   96,37    212,78     123,60

here i need to sort score

i have tried below code but it only works when there is a single record on a particular id

df.groupBy("id").pivot("week").agg(first("score"))

Prasad Khode · Accepted Answer

Instead of first you should be collecting all the values using collect_list, this will give you the result in list

import org.apache.spark.sql.functions._

df.groupBy("id").pivot("week").agg(collect_list("score")).show()

output :

+---+---------+---------+---------+
|id |1        |2        |3        |
+---+---------+---------+---------+
|101|[138, 59]|[282, 97]|[220, 87]|
|102|[96, 37] |[212, 78]|[60, 123]|
+---+---------+---------+---------+

pivot in PYSPARKSQL

Answers (2)

Related Questions