Reputation: 33
I have two Arrays of Key/Value Pairs Array[(String, Int)] that I want to join, but only return the minimum value when there's a match on the key.
val a = Array(("personA", 1), ("personB", 4), ("personC", 5))
val b = Array(("personC", 4), ("personA", 2))
Goal: c = Array((personA, 1), (personC, 4))
val c = a.join(b).collect()
results in: c = Array((personA, (1, 2)), (personC, (5, 4)))
I've tried to achieve this using the join method but am having difficulties reducing the values after they have been joined into a single array: Array[(String, (Int, Int))]
.
Upvotes: 2
Views: 1138
Reputation: 8299
Try this:
val a = Array(("personA", 1), ("personB", 4), ("personC", 5))
val b = Array(("personC", 4), ("personA", 2))
val bMap = b.toMap
val cMap = a.toMap.filterKeys(bMap.contains).map {
case(k, v) => k -> Math.min(v, bMap(k))
}
val c = cMap.toArray
The toMap
method converts the Array[(String, Int)]
into a Map[String, Int]
; filterKeys
is then used to retain only the keys (strings) in a.toMap
that are also in b.toMap
. The map
operation then chooses the minimum value of the two available values for each key, and creates a new map associating each key with that minimum value. Finally, we convert the resulting map back to an Array[(String, Int)]
using toArray
.
UPDATED
BTW: I'm not sure where you get the Array.join
method from: Array
doesn't have such a method, so a.join(b)
doesn't work for me. However, I suspect that a
and b
might be Apache Spark PairRDD
collections (or something similar). If that's the case, then you can join a
and b
, then map the values to the minimum of each pair (a reduce
operation is not what you want) as follows:
a.join(b).mapValues(v => Math.min(v._1, v._2)).collect
collect
converts the result into an Array[(String, Int)]
as you require.
Upvotes: 2