Reputation: 1311
I have an RDD that contains tuples like this
(A, List(2,5,6,7))
(B, List(2,8,9,10))
and I would like to get the index of the first element where a specific condition between value and index holds. So far I have tried this on a single tuple test and it works fine:
test._2.zipWithIndex.indexWhere { case (v, i) => SOME_CONDITION}
I just can't find how to iterate over all tuples in the list.. I have tried:
val result= test._._2.zipWithIndex.indexWhere { case (v, i) => SOME_CONDITION}
Upvotes: 0
Views: 1240
Reputation: 37852
First, "iterate" is the wrong concept here - it comes from the realm of imperative programming, where you actually iterate over the data structure yourself. Spark uses a functional paradigm, which let's you pass a function to handle each record in the RDD (using some higher-order function like map
, foreach
...).
In this case, sounds like you want to map each element into a new element.
To map only the right-hand side of your tuples (without changing the left-hand side), you can use mapValues
:
// mapValues will map the "values" (of type List[Int]) to new values (of type Int)
rdd.mapValues(list => list.zipWithIndex.indexWhere {
case (v, i) => someCondition(v, i)
})
Or, alternatively, using plain map
:
rdd.map {
case (key, list) => (key, list.zipWithIndex.indexWhere {
case (v, i) => someCondition(v, i)
})
}
Upvotes: 4