sk1007
sk1007

Reputation: 571

How to sort the RDD and get top N elements using scala?

I have RDD of a case class(TopNModel) and want to get top N elements from giving RDD where sort by tx + rx. In case of two equal (tx + rx) sort by mac.

case class TopNModel(mac: Long, tx: Int, rx: Int)

For example:

RDD[TopNModel(10L, 200, 100), TopNModel(12L, 100, 100), TopNModel(1L, 200, 400), TopNModel(11L, 100, 200)]

sort by tx + rx and mac:

RDD[TopNModel(1L, 200, 400), TopNModel(10L, 200, 100), TopNModel(11L, 100, 200), TopNModel(12L, 100, 100)]

My Question:

  1. How to sort if rx + tx values are the same then sort based on mac?

Upvotes: 0

Views: 805

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37832

EDIT: per important comment below, if indeed the requirement is to "get top N" entities based on this order, sortBy is wasteful compared to takeOrdered. Use the second solution ("alternative") with takeOrdered.


You can use the fact that tuples are naturally-ordered from "leftmost" argument to right, and create a tuple with the negative value of tx + rx (so that these are sorted in decending order) and the positive value of mac:

val result = rdd.sortBy { case TopNModel(mac, tx, rx) => (-(tx + rx), mac) }

Alternatively, if you want TopNModel to always be sorted this way (no matter the context), you can make it an Ordered and implement its compare method. Then, sorting by identity will use that compare to get the same result:

case class TopNModel(mac: Long, tx: Int, rx: Int) extends Ordered[TopNModel] {
  import scala.math.Ordered.orderingToOrdered
  def compare(that: TopNModel): Int = (-(tx + rx), mac) compare (-(that.tx + that.rx), that.mac)
}

val result = rdd.sortBy(identity)

Upvotes: 2

Related Questions