Nabih Bawazir
Nabih Bawazir

Reputation: 7275

How to pivot by value in pyspark

Here's my input

+----+-----+---+------+----+------+-------+--------+
|year|month|day|new_ts|hour|minute|ts_rank|   label|
+----+-----+---+------+----+------+-------+--------+
|2022|    1|  1|    13|  13|    24|      1|       7|
|2022|    1|  1|    14|  13|    24|      1|       8|
|2022|    1|  2|    15|  13|    24|      1|       7|
|2022|    1|  2|    16|  13|    44|      7|       8|
+----+-----+---+------+----+------+-------+--------+

Here's my output

+----+-----+---+-------+--------+
|year|month|day|     7 |       8|
+----+-----+---+-------+--------+
|2022|    1|  1|     13|      14|
|2022|    1|  2|     15|      16|
+----+-----+---+-------+--------+

Here's the pandas code

df_pivot = df.pivot(index=["year","month","day"], columns="label", values="new_ts").reset_index()

What I try

df_pivot = df.groupBy(["year","month","day"]).pivot("label").value("new_ts")

Note: sorry I can't show my error message here, because I'm using cloud solution and its only show the line of error not error message

Upvotes: 1

Views: 32

Answers (1)

wwnde
wwnde

Reputation: 26676

df.groupBy("year","month","day").pivot('label').agg(first('new_ts')).show()


+----+-----+---+---+---+
|year|month|day|  7|  8|
+----+-----+---+---+---+
|2022|    1|  1| 13| 14|
|2022|    1|  2| 15| 16|
+----+-----+---+---+---+

Upvotes: 1

Related Questions