Reputation: 1
I am using
val my_data = sc.textFile("abc.txt")
val my_search = my_data.filter(x => x.contains("is","work"))
where I am trying to filter the lines contains "is" and "work" in my RDD "My_Data"
Upvotes: 0
Views: 52
Reputation: 1211
If you know all the strings you want to filter beforehand (as in the example you gave), you can do the following:
my_data.filter(x => Seq("is", "work").forall(x.contains))
Full words
If you want to filter the full words, you will need to tokenize each line first. The easiest way to do it is by using a string.split(" ")
. Be careful, as this doesn't work for languages like Japanese or Chinese.
my_data.filter { line =>
val tokens = line.split(" ").toSet
Seq("is", "work").forall(tokens.contains)
}
Upvotes: 2