Reputation: 991
I am new to Apache Spark and Scala. I am trying to understand something here: -
I have one array:
Companies= Array(
(Microsoft,478953),
(IBM,332042),
(JP Morgan,226003),
(Google,342033)
)
I wanted to divide this by another array, element by element:
Count = Array((Microsoft,4), (IBM,3), (JP Morgan,2), (Google,3))
I used this code :
val result: Array[(String, Double)] = wordMapCount
.zip(letterMapCount)
.map { case ((letter, wc), (_, lc)) => (letter, lc.toDouble / wc) }
From here: Divide Arrays This works. However, I do not understand it. Why does zip require the second array and not the first one also the case matching how is that working here?
Upvotes: 1
Views: 1043
Reputation: 149538
Why does zip require the second array and not the first one?
Because that's how zip works. It takes two separate RDD
instances and maps one over the other to create pair of the first and second element:
def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]
case matching how is that working here
You have two tuples:
(Microsoft, 478953), (Microsoft,4)
What this partial function does decomposition of the tuple type via a call to Tuple2.unapply
. This:
case ((letter, wc), (_, lc))
Means "extract the first argument (_1
) from the first tuple into a fresh value named letter
, and the second argument (_2
) to a fresh value named wc
. Same goes for the second tuple. And then, it creates a new tuple with letter
as the first value and the division of lc
and wc
as the second argument.
Upvotes: 1