SMC
SMC

Reputation: 33

Scala Join Arrays and Reduce on Value

I have two Arrays of Key/Value Pairs Array[(String, Int)] that I want to join, but only return the minimum value when there's a match on the key.

val a = Array(("personA", 1), ("personB", 4), ("personC", 5))
val b = Array(("personC", 4), ("personA", 2))

Goal: c = Array((personA, 1), (personC, 4))

val c = a.join(b).collect()

results in: c = Array((personA, (1, 2)), (personC, (5, 4)))

I've tried to achieve this using the join method but am having difficulties reducing the values after they have been joined into a single array: Array[(String, (Int, Int))].

Upvotes: 2

Views: 1138

Answers (1)

Mike Allen
Mike Allen

Reputation: 8299

Try this:

val a = Array(("personA", 1), ("personB", 4), ("personC", 5))
val b = Array(("personC", 4), ("personA", 2))
val bMap = b.toMap
val cMap = a.toMap.filterKeys(bMap.contains).map {
  case(k, v) => k -> Math.min(v, bMap(k))
}
val c = cMap.toArray

The toMap method converts the Array[(String, Int)] into a Map[String, Int]; filterKeys is then used to retain only the keys (strings) in a.toMap that are also in b.toMap. The map operation then chooses the minimum value of the two available values for each key, and creates a new map associating each key with that minimum value. Finally, we convert the resulting map back to an Array[(String, Int)] using toArray.

UPDATED

BTW: I'm not sure where you get the Array.join method from: Array doesn't have such a method, so a.join(b) doesn't work for me. However, I suspect that a and b might be Apache Spark PairRDD collections (or something similar). If that's the case, then you can join a and b, then map the values to the minimum of each pair (a reduce operation is not what you want) as follows:

a.join(b).mapValues(v => Math.min(v._1, v._2)).collect

collect converts the result into an Array[(String, Int)] as you require.

Upvotes: 2

Related Questions