Jay Vignesh
Jay Vignesh

Reputation: 179

How to find the row with the latestTimeStamp from a List[Row] in Scala?

Lets say that I have a list,

val list = List("""{"name":"abc","salary":"2000","id":"1","timeStamp" : "1528725600000"}""") 

Lets assume that there are several rows coming like this from kafka or some othere source.

I want to get the row with the latest timestamp. How do i do it ?

Upvotes: 0

Views: 96

Answers (2)

gemelen
gemelen

Reputation: 617

With few of your last questions, it needs to be clarified that you should start think of the data processed in Spark as a collections with provided sql-like functions.

You have some data in a RDD/DataFrame and you need to treat it as an element of Scala collection or row in a table, whichever is more suitable for you.

So, for both approaches, map() your collection to split JSON into actual fields and use max() on required field/column.

Upvotes: 0

geek94
geek94

Reputation: 473

First you need to parse your string. You can use play json, add this dependency to your project:

"com.typesafe.play" %% "play-json" % "2.6.9"

Now, let us assume you are not using any any case class, you can parse the above string in Map[String,String]. So do the following and you'll have your expected output.

   list.map(x => Json.parse(x).as[Map[String,String]])
  .sortBy(y => y.getOrElse("timeStamp","0").toLong)

You'll get the sorted list on the basis of timestamp in ascending order, last element of your list will be the latest record.

Upvotes: 1

Related Questions