Reputation: 4877
In Spark 1.3, is there a way to access the key from mapValues
?
Specifically, if I have
val y = x.groupBy(someKey)
val z = y.mapValues(someFun)
can someFun
know which key of y it is currently operating on?
Or do I have to do
val y = x.map(r => (someKey(r), r)).groupBy(_._1)
val z = y.mapValues{ case (k, r) => someFun(r, k) }
Note: the reason I want to use mapValues
rather than map
is to preserve the partitioning.
Upvotes: 7
Views: 2121
Reputation: 1
You can apply zipWithIndex().map(lambda x : (x[1], x[0])).mapValues() after doing groupByKey(). It will give you the (key, value) pair in mapValues function.
Upvotes: 0
Reputation: 11284
In this case you can use mapPartitions
with the preservesPartitioning
attribute.
x.mapPartitions((it => it.map { case (k,rr) => (k, someFun(rr, k)) }), preservesPartitioning = true)
You just have to make sure you are not changing the partitioning, i.e. don't change the key.
Upvotes: 10
Reputation: 1838
You can't use the key with mapValues
. But you can preserve the partitioning with the mapPartitions
.
val pairs: Rdd[(Int, Int)] = ???
pairs.mapPartitions({ it =>
it.map { case (k, v) =>
// your code
}
}, preservesPartitioning = true)
Be careful to actually preserve the partitioning, the compiler will not be able to check it.
Upvotes: 4