Reputation: 179
Lets say that I have a list,
val list = List("""{"name":"abc","salary":"2000","id":"1","timeStamp" : "1528725600000"}""")
Lets assume that there are several rows coming like this from kafka or some othere source.
I want to get the row with the latest timestamp. How do i do it ?
Upvotes: 0
Views: 96
Reputation: 617
With few of your last questions, it needs to be clarified that you should start think of the data processed in Spark as a collections with provided sql-like functions.
You have some data in a RDD/DataFrame and you need to treat it as an element of Scala collection or row in a table, whichever is more suitable for you.
So, for both approaches, map() your collection to split JSON into actual fields and use max() on required field/column.
Upvotes: 0
Reputation: 473
First you need to parse your string. You can use play json, add this dependency to your project:
"com.typesafe.play" %% "play-json" % "2.6.9"
Now, let us assume you are not using any any case class, you can parse the above string in Map[String,String]. So do the following and you'll have your expected output.
list.map(x => Json.parse(x).as[Map[String,String]])
.sortBy(y => y.getOrElse("timeStamp","0").toLong)
You'll get the sorted list on the basis of timestamp in ascending order, last element of your list will be the latest record.
Upvotes: 1