Pyspark Transpose

Question

I have data in the below format with 38 measure columns for various months as shown below.

+---------+-----------------+-----------------+------+------------------+------------------+------------------+---------+------------------+
| Cust_No | Measure1_month1 | Measure1_month2 | .... | Measure1_month72 | Measure2_month_1 | Measure2_month_2 | ….so on | Measure2_month72 |....Measure38_month1...
+---------+-----------------+-----------------+------+------------------+------------------+------------------+---------+------------------+
|       1 |              10 |              20 | ….   |              500 |               40 |               50 | …       |                  |
|       2 |              20 |              40 | ….   |              800 |               70 |              150 | …       |                  |
+---------+-----------------+-----------------+------+------------------+------------------+------------------+---------+------------------+

I want to achieve the below format using PYSPARK.

+---------+-------+----------+----------+
| CustNum | Month | Measure1 | Measure2.......measure38 |
+---------+-------+----------+----------+
|       1 |     1 |       10 |       30 |
|       1 |     2 |       20 |       40 |
|       1 |     3 |       30 |       80 |
|       1 |     4 |       70 |       90 |
|       1 |     5 |       40 |      100 |
|       . |     . |        . |        . |
|       . |     . |        . |        . |
|       1 |    72 |      700 |       50 |
+---------+-------+----------+----------+

and so on for every customer number

Can you please help me with this?

Thanks

Pyspark Transpose

Answers (1)

Related Questions