Counting number of occurrences of Array element in a RDD

Question

I have a RDD1 with Key-Value pair of type [(String, Array[String])] (i will refer to them as (X, Y)), and a Array Z[String]. I'm trying for every element in Z to count how many X instances there are that have Z in Y. I want my output as ((X, Z(i)), #ofinstances).

RDD1= ((A, (2, 3, 4), (B, (4, 4, 4)), (A, (4, 5)))
Z = (1, 4)

then i want to get:

(((A, 4), 2), ((B, 4), 1))

Hope that made sense. As you can see over, i only want an element if there is atleast one occurence.

I have tried this so far:

val newRDD = RDD1.map{case(x, y) => for(i <- 0 to (z.size-1)){if(y.contains(z(i))) {((x, z(i)), 1)}}}

My output here is an RDD[Unit]

Im not sure if what i'm asking for is even possible, or if i have to do it an other way.

Counting number of occurrences of Array element in a RDD

Answers (1)

Related Questions