Reputation: 167
I have a Dataset in spark java as: Current:
+--------------+--------------------+
| x | YS. |
+--------------+--------------------+
|x1 | [Y1,Y2] |
|x2 | [Y3] |
I want to explode this Dataset and convert the array in to individual entry as"
Desired:
+--------------+--------------------+
| x | YS.
+--------------+--------------------+
|x1 | Y1
|X1 |. Y2
|x2 | Y3
I read the table from database and read the two column but unable to use the explode functionality.
DS = reader.option("table", "dummy").load()
.select(X,YS).explode(??)
How should I use the explode and get the desired Dataset with Java.
Upvotes: 1
Views: 1437
Reputation: 13058
In the principle, you need to select a new column (not the YS
column), where the value of the new column will be an exploded YS
column value.
Starting from the code from the question, this would be something like:
ds = reader.option("table", "dummy").load()
ds = ds.select(ds.col("X"), explode(ds.col("YS")).as("Y"))
Here is the API doc: https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-
Upvotes: 1