how to convert values in a column to array in an order defined in a list

Question

I have a list of values like below

[hpc_max,asset_min,off_median]

and a table like this

| item_name | item_value | timestamp                     |
| --------- | ----------| -----------------------------|
| hpc_max   | 0.25      | 2023-03-01T17:20:00.000+0000 |
| asset_min | 0.34      | 2023-03-01T17:20:00.000+0000 |
| off_median| 0.30      | 2023-03-01T17:30:00.000+0000 |
| hpc_max   | 0.54      | 2023-03-01T17:30:00.000+0000 |
| asset_min | 0.32      | 2023-03-01T17:35:00.000+0000 |
| off_median| 0.67      | 2023-03-01T17:20:00.000+0000 |
| asset_min | 0.54      | 2023-03-01T17:30:00.000+0000 |
| off_median| 0.32      | 2023-03-01T17:35:00.000+0000 |
| hpc_max   | 0.67      | 2023-03-01T17:35:00.000+0000 |

and I want to group the item_values based on the timestamp in an array type in the order of item names that are in the list

The output I want is

item_name	item_value	timestamp
["hpc_max","asset_min","off_median"]	[0.25,0.34,0.67]	2023-03-01T17:20:00.000+0000
["hpc_max","asset_min","off_median"]	[0.54,0.54,0.30]	2023-03-01T17:30:00.000+0000
["hpc_max","asset_min","off_median"]	[0.67,0.32,0.32]	2023-03-01T17:35:00.000+0000

how can I do this using pyspark?

would appreciate any help!

how to convert values in a column to array in an order defined in a list

Answers (1)

Related Questions