Reputation: 571
I have RDD of a case class(TopNModel) and want to get top N elements from giving RDD where sort by tx + rx. In case of two equal (tx + rx) sort by mac.
case class TopNModel(mac: Long, tx: Int, rx: Int)
For example:
RDD[TopNModel(10L, 200, 100), TopNModel(12L, 100, 100), TopNModel(1L, 200, 400), TopNModel(11L, 100, 200)]
sort by tx + rx and mac:
RDD[TopNModel(1L, 200, 400), TopNModel(10L, 200, 100), TopNModel(11L, 100, 200), TopNModel(12L, 100, 100)]
My Question:
Upvotes: 0
Views: 805
Reputation: 37832
EDIT: per important comment below, if indeed the requirement is to "get top N" entities based on this order, sortBy
is wasteful compared to takeOrdered
. Use the second solution ("alternative") with takeOrdered
.
You can use the fact that tuples are naturally-ordered from "leftmost" argument to right, and create a tuple with the negative value of tx + rx
(so that these are sorted in decending order) and the positive value of mac
:
val result = rdd.sortBy { case TopNModel(mac, tx, rx) => (-(tx + rx), mac) }
Alternatively, if you want TopNModel
to always be sorted this way (no matter the context), you can make it an Ordered
and implement its compare
method. Then, sorting by identity will use that compare
to get the same result:
case class TopNModel(mac: Long, tx: Int, rx: Int) extends Ordered[TopNModel] {
import scala.math.Ordered.orderingToOrdered
def compare(that: TopNModel): Int = (-(tx + rx), mac) compare (-(that.tx + that.rx), that.mac)
}
val result = rdd.sortBy(identity)
Upvotes: 2