Spark SQL Dataset: Split Multiple Array Columns to individual rows

Question

I'm new to Spark SQL and the Dataset / Dataframe API.

I have columns of which 2 columns both have multi values / arrays in my Dataset.

I want to step through the arrays per line positionally, and output a new row for each set of corresponding positional entries in the arrays. You can see how from the 2 diagrams below.

For example:

Input dataframe / dataset

+---+---------+-----+
| id|       le|leloc|
+---+---------+-----+
|  1|[aaa,bbb]|[1,2]|
|  2|[ccc,ddd]|[3,4]|
+---+---------+-----+

Expected Output dataset

I need output as per below, the data is transformed from columns to rows:

+---+---------+-----+
| id|       le|leloc|
+---+---------+-----+
|  1|aaa      |1    |
|  1|bbb      |2    |
|  2|ccc      |3    |
|  2|ddd      |4    |
+---+---------+-----+

Spark SQL Dataset: Split Multiple Array Columns to individual rows

Answers (1)

Related Questions