Reputation: 5642
I have a list/array of records, and I'm using explode
to extract data from the list. I'd like to pick the first record from the exploded result, using Spark SQL in Java.
Dataset<Row> ds= ds.select(
json.col("*"),
explode(json.col("records.record.newrecord")).as("newrecord"));
ds= ds.select(ds.col("EVENT_SEQ"), ds.col("newrecord").apply("event").as("EVENTTYPE"));
Current Data:
| EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...| 0|
|5a694d77-bc65-4bf...| 0|
+--------------------+---------+
Requirements:
| EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...| 0|
+--------------------+---------+
I have seen documentation that suggests Cloumn.apply
for this purpose, but I haven't found enough help to get me started.
Upvotes: 2
Views: 180
Reputation: 74619
That's certainly groupBy
operator with first
function.
val ds = Seq(
("5a694d77-bc65-4bf...", 0),
("5a694d77-bc65-4bf...", 0)
).toDF("EVENT_SEQ", "EVENTTYPE")
scala> ds.show
+--------------------+---------+
| EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...| 0|
|5a694d77-bc65-4bf...| 0|
+--------------------+---------+
scala> ds.groupBy("EVENT_SEQ").agg(first("EVENTTYPE")).show
+--------------------+-----------------------+
| EVENT_SEQ|first(EVENTTYPE, false)|
+--------------------+-----------------------+
|5a694d77-bc65-4bf...| 0|
+--------------------+-----------------------+
Upvotes: 1