samba
samba

Reputation: 3101

Scala - how to filter a nested collection structure?

Here are my case classes:

case class Metric (
id: Long,
name: String,
features: List[Feature]
)

case class Feature (
featureId: Long,
name: String,
value: String,
processTime: Timestamp
)

Each Metric has a List[Feature]. I want to filter each Metric so that its List[Feature] contains only the latest Features per featureId.

I've tried the following but it returns jut a List[immutable.Iterable[Feature]] where the Features are filtered correctly. But I need it to reteurn a List[Metric] with filtered Feature lists inside it.

val f1 = Feature(1, "f1", "v1", Timestamp.valueOf("2019-07-01 00:00:00"))
val f2 = Feature(1, "f2", "v2", Timestamp.valueOf("2019-07-05 00:00:00"))
val f3 = Feature(2, "f3", "v3", Timestamp.valueOf("2019-03-07 00:00:00"))
val f4 = Feature(2, "f4", "v4", Timestamp.valueOf("2019-03-10 00:00:00"))

val metric1 = Metric(1, "m1", List(f1, f2, f3, f4))
val metric2 = Metric(1, "m1", List(f3, f4))

val metricsList = List(metric1, metric2)

val newMetrics = metricsList.map(m => m.features.groupBy(_.featureId)
  .map { case (featureId, metricsList) => metricsList.reduce {
    (m1: Feature, m2: Feature) => if (m1.processTime.after(m2.processTime)) m1 else m2
  }
  })

UPD: the expected output is a List(metric1, metric2) where

val metric1 = Metric(1, "m1", List(f2,f4)) and val metric2 = Metric(1, "m1", List(f4))

Upvotes: 0

Views: 186

Answers (1)

Matt Fowler
Matt Fowler

Reputation: 2733

You can do this using a case class copy method on your Metric class. This will create a new instance of Metric with the filtered features. Note that you can also use maxBy so you don't need to use reduce. To do this you'll need to supply an Ordering implicit to sort timestamps. The below code should do what you're looking for:

implicit def ordered: Ordering[Timestamp] = new Ordering[Timestamp] {
  def compare(x: Timestamp, y: Timestamp): Int = x compareTo y
}

val newMetrics = metricsList.map(m => {
  val features = m.features.groupBy(_.featureId).mapValues(_.maxBy(_.processTime)).values
  m.copy(features = features.toList)
})

Upvotes: 2

Related Questions