Imran
Imran

Reputation: 5642

How to select the first record from group?

I have a list/array of records, and I'm using explode to extract data from the list. I'd like to pick the first record from the exploded result, using Spark SQL in Java.

Dataset<Row> ds= ds.select(
  json.col("*"), 
  explode(json.col("records.record.newrecord")).as("newrecord"));
ds= ds.select(ds.col("EVENT_SEQ"), ds.col("newrecord").apply("event").as("EVENTTYPE")); 

Current Data:

|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

Requirements:

|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

I have seen documentation that suggests Cloumn.apply for this purpose, but I haven't found enough help to get me started.

Upvotes: 2

Views: 180

Answers (1)

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

That's certainly groupBy operator with first function.

val ds = Seq(
  ("5a694d77-bc65-4bf...", 0),
  ("5a694d77-bc65-4bf...", 0)
).toDF("EVENT_SEQ", "EVENTTYPE")
scala> ds.show
+--------------------+---------+
|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

scala> ds.groupBy("EVENT_SEQ").agg(first("EVENTTYPE")).show
+--------------------+-----------------------+
|           EVENT_SEQ|first(EVENTTYPE, false)|
+--------------------+-----------------------+
|5a694d77-bc65-4bf...|                      0|
+--------------------+-----------------------+

Upvotes: 1

Related Questions