NikhiR
NikhiR

Reputation: 79

loop inside spark RDD filter

I am new to Spark and am trying to code in scala. I have an RDD which consists of data in the form :

1: 2 3 5
2: 5 6 7 
3: 1 8 9
4: 1 2 4

and another list in the form [1,4,8,9]

I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list.

I have written the following code:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        for(x<-l(0).split(" ")){
                            root.contains(x.toInt)
                        }
                    })

linksFile is the RDD and root is the list.

But this doesn't work. any suggestions??

Upvotes: 2

Views: 887

Answers (2)

Dima
Dima

Reputation: 40500

For-comprehension without a yield doesn't ... well ... yield :) But you don't really need for-comprehension (or any "loop" for that matter) here.

Something like this:

linksFile.map(
   _.split(": ").map(_.toInt)
 ).filter(_.exits(list.toSet))
  .map(_.mkString)

should do it.

Upvotes: 0

Joe K
Joe K

Reputation: 18424

You're close: the for-loop just doesn't actually use the value computed inside it. You should use the exists method instead. Also I think you want l(1), not l(0) for the second check:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        l(1).split(" ").exists { x =>
                            root.contains(x.toInt)
                        }
                    })

Upvotes: 3

Related Questions