eirasf
eirasf

Reputation: 89

Key-value pair order in Spark

When applying a function such as reduceByKey, is there any way to specify a key other than the first element of the tuple?

My current solution consists in using a map function to rearrange the tuple in the correct order by I assume that this additional operation comes at a computational cost, right?

Upvotes: 0

Views: 1490

Answers (2)

Balduz
Balduz

Reputation: 3570

To use reduceByKey, you need a key-value RDD[K,V] where K is the key that will be used. If you have a RDD[V] you need to perform a map first to specify the key.

myRdd.map(x => (x, 1))

If you already have a RDD[K,V] where the key is not what you want... You need another map. There is no other way to get around this. For instance, if you want to switch between your key and your value, you could do the following:

myPairRdd.map(_.swap)

Upvotes: 3

Carlos Vilchez
Carlos Vilchez

Reputation: 2804

You can override the compare function and call to sortByKey:

implicit val sortFunction = new Ordering[String] {
  override def compare(a: String, b: String) = // compare function
}

val rddSet: RDD[(String, String)] = sc.parallelize(dataSet)

rddSet.sortByKey()

Upvotes: 0

Related Questions