Reputation: 544
I'm trying to convert my struct array by columns of my array, where I would like to separate the fields by ;
My data is this way
root
|-- id_payment: string (nullable = true)
|-- contents_json: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- id_seller: string (nullable = true)
| | |-- tot_product: string (nullable = true)
| | |-- tot_unit: long (nullable = true)
| | |-- prizes: string (nullable = true)
+--------------------------------------------+
|contents_json |
+--------------------------------------------+
|[{7, 1, 0, 0}, {5, 1, 0, 0}, {5, 1, 0, 0}] |
|[{7, 1, 0, 0}] |
+--------------------------------------------+
I would like to leave it like this
+-----------------------------------------------+
| id_seller| tot_product | tot_unit | prizes |
+-----------------------------------------------+
| 7;5;7 | 1;12;5 | 0;1;0 | 0;0;1 |
| 10 | 1 | 0 | 0 |
+-----------------------------------------------+
But I couldn't do this with ws_concat
I tried with this code, but it didn't work. What am I doing wrong
I tried this code, but instead of concatenating it broke into two lines. What I'm trying to do is to concatenate in the same cell these lines val1;val2;val3 and so on. How can I fix this?
sparkdf2.selectExpr("explode(contents_json) AS structCol").select(F.expr("concat_ws(',', structCol.id_campaign)").alias("id_campaign")).show()
Upvotes: 0
Views: 442
Reputation: 15258
well, if you just select the columns, isn't it enough ?
df.show()
+----------+------------------------------------------+
|id_payment|contents_json |
+----------+------------------------------------------+
|1 |[[7, 1, 0, 0], [5, 1, 0, 0], [5, 1, 0, 0]]|
|2 |[[7, 1, 0, 0]] |
+----------+------------------------------------------+
df.select(
"id_payment",
"contents_json.id_seller",
"contents_json.tot_product",
"contents_json.tot_unit",
"contents_json.prizes",
).show()
+----------+---------+-----------+---------+---------+
|id_payment|id_seller|tot_product| tot_unit| prizes|
+----------+---------+-----------+---------+---------+
| 1|[7, 5, 5]| [1, 1, 1]|[0, 0, 0]|[0, 0, 0]|
| 2| [7]| [1]| [0]| [0]|
+----------+---------+-----------+---------+---------+
You can use concat_ws
if needed :
df.select(
"id_payment",
F.concat_ws(";","contents_json.id_seller").alias("id_seller"),
"contents_json.tot_product",
"contents_json.tot_unit",
"contents_json.prizes",
).show()
+----------+---------+-----------+---------+---------+
|id_payment|id_seller|tot_product| tot_unit| prizes|
+----------+---------+-----------+---------+---------+
| 1| 7;5;5| [1, 1, 1]|[0, 0, 0]|[0, 0, 0]|
| 2| 7| [1]| [0]| [0]|
+----------+---------+-----------+---------+---------+
Upvotes: 2