Spark: Dividing one array by elements in another

Question

I am new to Apache Spark and Scala. I am trying to understand something here: -

I have one array:

Companies= Array(
  (Microsoft,478953), 
  (IBM,332042), 
  (JP Morgan,226003), 
  (Google,342033)
)

I wanted to divide this by another array, element by element:

Count = Array((Microsoft,4), (IBM,3), (JP Morgan,2), (Google,3))

I used this code :

val result: Array[(String, Double)] = wordMapCount
  .zip(letterMapCount)
  .map { case ((letter, wc), (_, lc)) => (letter, lc.toDouble / wc) }

From here: Divide Arrays This works. However, I do not understand it. Why does zip require the second array and not the first one also the case matching how is that working here?

Yuval Itzchakov · Accepted Answer

Why does zip require the second array and not the first one?

Because that's how zip works. It takes two separate RDD instances and maps one over the other to create pair of the first and second element:

def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]

case matching how is that working here

You have two tuples:

(Microsoft, 478953), (Microsoft,4)

What this partial function does decomposition of the tuple type via a call to Tuple2.unapply. This:

case ((letter, wc), (_, lc))

Means "extract the first argument (_1) from the first tuple into a fresh value named letter, and the second argument (_2) to a fresh value named wc. Same goes for the second tuple. And then, it creates a new tuple with letter as the first value and the division of lc and wc as the second argument.

Spark: Dividing one array by elements in another

Answers (1)

Related Questions