Ekaterina Tcareva
Ekaterina Tcareva

Reputation: 439

Find the latest / earliest day in Spark RDD

I have a m2 RDD consisting of

case class Medication(patientID: String, date: Date, medicine: String)

and I need to find the first and the last day. I tried

val latest_date_m2  = m2.maxBy(_.date).date

I got:

No implicit Ordering defined for java.sql.Date.
[error]       val latest_date_m2 = m2.maxBy(_.date).date

It looks like Scala "does not know" how to compare the dates. I think, I need replace maxBy by a different function, but I cannot find this one.

Upvotes: 0

Views: 245

Answers (1)

user11046693
user11046693

Reputation:

Just provide the Ordering

import scala.math.Ordering

object SQLDateOrdering extends Ordering[java.sql.Date] {
  def compare(a: java.sql.Date, b: java.sql.Date) = a compareTo b
}

m2.maxBy(_.date)(SQLDateOrdering)

though it is worth noting that m2 cannot be RDD as RDD has no maxBy method (it is likely a Seq). If it was RDD you'd need

object MedicationDateOrdering extends Ordering[Medication] {
  def compare(a: Medication, b: Medication) = a.date compareTo b.date
}

and max

m2.max()(MedicationDateOrdering)

Upvotes: 3

Related Questions