Programmer
Programmer

Reputation: 167

How to explode an array column in spark java with dataset

I have a Dataset in spark java as: Current:

+--------------+--------------------+
|          x   |               YS.   |
+--------------+--------------------+
|x1            |   [Y1,Y2]          |
|x2            |   [Y3]             |

I want to explode this Dataset and convert the array in to individual entry as"

Desired:

+--------------+--------------------+
|          x   |    YS.   
+--------------+--------------------+
|x1            |   Y1          
|X1            |.  Y2
|x2            |   Y3            

I read the table from database and read the two column but unable to use the explode functionality.

DS = reader.option("table", "dummy").load()
                .select(X,YS).explode(??)

How should I use the explode and get the desired Dataset with Java.

Upvotes: 1

Views: 1437

Answers (1)

Tim
Tim

Reputation: 13058

In the principle, you need to select a new column (not the YS column), where the value of the new column will be an exploded YS column value.

Starting from the code from the question, this would be something like:

ds = reader.option("table", "dummy").load()
ds = ds.select(ds.col("X"), explode(ds.col("YS")).as("Y"))

Here is the API doc: https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-

Upvotes: 1

Related Questions