How to select the first record from group?

Question

I have a list/array of records, and I'm using explode to extract data from the list. I'd like to pick the first record from the exploded result, using Spark SQL in Java.

Dataset ds= ds.select(
  json.col("*"), 
  explode(json.col("records.record.newrecord")).as("newrecord"));
ds= ds.select(ds.col("EVENT_SEQ"), ds.col("newrecord").apply("event").as("EVENTTYPE"));

Current Data:

|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

Requirements:

|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

I have seen documentation that suggests Cloumn.apply for this purpose, but I haven't found enough help to get me started.

Jacek Laskowski · Accepted Answer

That's certainly groupBy operator with first function.

val ds = Seq(
  ("5a694d77-bc65-4bf...", 0),
  ("5a694d77-bc65-4bf...", 0)
).toDF("EVENT_SEQ", "EVENTTYPE")
scala> ds.show
+--------------------+---------+
|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

scala> ds.groupBy("EVENT_SEQ").agg(first("EVENTTYPE")).show
+--------------------+-----------------------+
|           EVENT_SEQ|first(EVENTTYPE, false)|
+--------------------+-----------------------+
|5a694d77-bc65-4bf...|                      0|
+--------------------+-----------------------+

How to select the first record from group?

Answers (1)

Related Questions