Explode Array Element into a unique column

Question

I'm new to Pyspark and trying to solve an ETL step.

I have the following schema below. I would like to take the variable that is inside the array and transform it into a column, but when doing this with explode I create duplicate rows because there are positions [0], [1], and [2] inside the element.

My goal is to transform what is inside variable into a new column taking everything that is in the element (separating by comma what was in each element) and transforming it into a string.

root
 |-- id: string (nullable = true)
 |-- info: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- variable: string (nullable = true)

Output:

id	new column
123435e5x-9a9z	A, B, D
555585a4Z-0B1Y	A

Thank you for the help

danimille · Accepted Answer

As mentioned by David Markovitz you can use the concat_ws function as below:

from pyspark.sql import functions as F
(df.withColumn('new column', F.concat_ws(', ', F.col('info'))

Explode Array Element into a unique column

Answers (1)

Related Questions