Key-value pair order in Spark

Question

When applying a function such as reduceByKey, is there any way to specify a key other than the first element of the tuple?

My current solution consists in using a map function to rearrange the tuple in the correct order by I assume that this additional operation comes at a computational cost, right?

Balduz · Accepted Answer

To use reduceByKey, you need a key-value RDD[K,V] where K is the key that will be used. If you have a RDD[V] you need to perform a map first to specify the key.

myRdd.map(x => (x, 1))

If you already have a RDD[K,V] where the key is not what you want... You need another map. There is no other way to get around this. For instance, if you want to switch between your key and your value, you could do the following:

myPairRdd.map(_.swap)

Key-value pair order in Spark

Answers (2)

Related Questions