user3661384
user3661384

Reputation: 544

PySpark - Convert Array Struct to Column Name the my Struct

I'm trying to convert my struct array by columns of my array, where I would like to separate the fields by ;

My data is this way

root
 |-- id_payment: string (nullable = true)
 |-- contents_json: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- id_seller: string (nullable = true)
 |    |    |-- tot_product: string (nullable = true)
 |    |    |-- tot_unit: long (nullable = true)
 |    |    |-- prizes: string (nullable = true)

+--------------------------------------------+
|contents_json                               |
+--------------------------------------------+
|[{7, 1, 0, 0}, {5, 1, 0, 0}, {5, 1, 0, 0}]  |
|[{7, 1, 0, 0}]                              |
+--------------------------------------------+

I would like to leave it like this

+-----------------------------------------------+
| id_seller| tot_product | tot_unit  | prizes   |
+-----------------------------------------------+
| 7;5;7    | 1;12;5      | 0;1;0     | 0;0;1    |
| 10       | 1           | 0         | 0        |
+-----------------------------------------------+

But I couldn't do this with ws_concat

I tried with this code, but it didn't work. What am I doing wrong

I tried this code, but instead of concatenating it broke into two lines. What I'm trying to do is to concatenate in the same cell these lines val1;val2;val3 and so on. How can I fix this?

sparkdf2.selectExpr("explode(contents_json) AS structCol").select(F.expr("concat_ws(',', structCol.id_campaign)").alias("id_campaign")).show()

enter image description here

Upvotes: 0

Views: 442

Answers (1)

Steven
Steven

Reputation: 15258

well, if you just select the columns, isn't it enough ?

df.show()
+----------+------------------------------------------+
|id_payment|contents_json                             |
+----------+------------------------------------------+
|1         |[[7, 1, 0, 0], [5, 1, 0, 0], [5, 1, 0, 0]]|
|2         |[[7, 1, 0, 0]]                            |
+----------+------------------------------------------+

df.select(
    "id_payment",
    "contents_json.id_seller",
    "contents_json.tot_product",
    "contents_json.tot_unit",
    "contents_json.prizes",
).show()
+----------+---------+-----------+---------+---------+
|id_payment|id_seller|tot_product| tot_unit|   prizes|
+----------+---------+-----------+---------+---------+
|         1|[7, 5, 5]|  [1, 1, 1]|[0, 0, 0]|[0, 0, 0]|
|         2|      [7]|        [1]|      [0]|      [0]|
+----------+---------+-----------+---------+---------+

You can use concat_ws if needed :

df.select(
    "id_payment",
    F.concat_ws(";","contents_json.id_seller").alias("id_seller"),
    "contents_json.tot_product",
    "contents_json.tot_unit",
    "contents_json.prizes",
).show()
+----------+---------+-----------+---------+---------+
|id_payment|id_seller|tot_product| tot_unit|   prizes|
+----------+---------+-----------+---------+---------+
|         1|    7;5;5|  [1, 1, 1]|[0, 0, 0]|[0, 0, 0]|
|         2|        7|        [1]|      [0]|      [0]|
+----------+---------+-----------+---------+---------+

Upvotes: 2

Related Questions