mitchus
mitchus

Reputation: 4877

Access key from mapValues or flatMapValues?

In Spark 1.3, is there a way to access the key from mapValues?

Specifically, if I have

val y = x.groupBy(someKey)
val z = y.mapValues(someFun)

can someFun know which key of y it is currently operating on?

Or do I have to do

val y = x.map(r => (someKey(r), r)).groupBy(_._1)
val z = y.mapValues{ case (k, r) => someFun(r, k) }

Note: the reason I want to use mapValues rather than map is to preserve the partitioning.

Upvotes: 7

Views: 2121

Answers (3)

Darshan
Darshan

Reputation: 1

You can apply zipWithIndex().map(lambda x : (x[1], x[0])).mapValues() after doing groupByKey(). It will give you the (key, value) pair in mapValues function.

Upvotes: 0

Marius Soutier
Marius Soutier

Reputation: 11284

In this case you can use mapPartitions with the preservesPartitioning attribute.

x.mapPartitions((it => it.map { case (k,rr) => (k, someFun(rr, k)) }), preservesPartitioning = true)

You just have to make sure you are not changing the partitioning, i.e. don't change the key.

Upvotes: 10

Lomig Mégard
Lomig Mégard

Reputation: 1838

You can't use the key with mapValues. But you can preserve the partitioning with the mapPartitions.

val pairs: Rdd[(Int, Int)] = ???
pairs.mapPartitions({ it =>
  it.map { case (k, v) =>
    // your code
  }
}, preservesPartitioning = true)

Be careful to actually preserve the partitioning, the compiler will not be able to check it.

Upvotes: 4

Related Questions