Reputation: 3111
I have an RDD[Sale]
and wanted to leave only the latest sales. So what I did is created a pair RDD and then performed grouping and filtering:
val sales: RDD[(String, Sale)] = rawSales.map(sale => sale.id -> sale)
.groupByKey()
.mapValues(_.maxBy(_.timestamp))
But how do I return back to RDD[Sale]
instead of the pair RDD in this case?
The only way I figured out is the following:
val value: RDD[Sale] = sales.map(salePaired => salePaired._2)
Is it the most proper solution?
Upvotes: 0
Views: 716
Reputation: 492
You can access the keys or values from pair RDD directly, like you access any Map
val keys: RDD[String] = sales.keys
val values: RDD[Sale] = sales.values
Upvotes: 1