Reputation: 89
When applying a function such as reduceByKey
, is there any way to specify a key other than the first element of the tuple?
My current solution consists in using a map
function to rearrange the tuple in the correct order by I assume that this additional operation comes at a computational cost, right?
Upvotes: 0
Views: 1490
Reputation: 3570
To use reduceByKey
, you need a key-value RDD[K,V]
where K
is the key that will be used. If you have a RDD[V]
you need to perform a map
first to specify the key.
myRdd.map(x => (x, 1))
If you already have a RDD[K,V]
where the key is not what you want... You need another map
. There is no other way to get around this. For instance, if you want to switch between your key and your value, you could do the following:
myPairRdd.map(_.swap)
Upvotes: 3
Reputation: 2804
You can override the compare function and call to sortByKey
:
implicit val sortFunction = new Ordering[String] {
override def compare(a: String, b: String) = // compare function
}
val rddSet: RDD[(String, String)] = sc.parallelize(dataSet)
rddSet.sortByKey()
Upvotes: 0