Scala - How to iterate over tuples on RDD?

Question

I have an RDD that contains tuples like this

(A, List(2,5,6,7))

(B, List(2,8,9,10))

and I would like to get the index of the first element where a specific condition between value and index holds. So far I have tried this on a single tuple test and it works fine:

test._2.zipWithIndex.indexWhere { case (v, i) => SOME_CONDITION}

I just can't find how to iterate over all tuples in the list.. I have tried:

val result= test._._2.zipWithIndex.indexWhere { case (v, i) => SOME_CONDITION}

Tzach Zohar · Accepted Answer

First, "iterate" is the wrong concept here - it comes from the realm of imperative programming, where you actually iterate over the data structure yourself. Spark uses a functional paradigm, which let's you pass a function to handle each record in the RDD (using some higher-order function like map, foreach...).

In this case, sounds like you want to map each element into a new element.

To map only the right-hand side of your tuples (without changing the left-hand side), you can use mapValues:

// mapValues will map the "values" (of type List[Int]) to new values (of type Int)
rdd.mapValues(list => list.zipWithIndex.indexWhere { 
  case (v, i) => someCondition(v, i) 
})

Or, alternatively, using plain map:

rdd.map { 
  case (key, list) => (key, list.zipWithIndex.indexWhere { 
    case (v, i) => someCondition(v, i) 
  }) 
}

Scala - How to iterate over tuples on RDD?

Answers (1)

Related Questions