Gaurav
Gaurav

Reputation: 1

how to search multipule string in Scala spark

I am using

val my_data = sc.textFile("abc.txt")
val my_search = my_data.filter(x => x.contains("is","work"))

where I am trying to filter the lines contains "is" and "work" in my RDD "My_Data"

Upvotes: 0

Views: 52

Answers (1)

Miguel
Miguel

Reputation: 1211

If you know all the strings you want to filter beforehand (as in the example you gave), you can do the following:

my_data.filter(x => Seq("is", "work").forall(x.contains))

Full words

If you want to filter the full words, you will need to tokenize each line first. The easiest way to do it is by using a string.split(" "). Be careful, as this doesn't work for languages like Japanese or Chinese.

my_data.filter { line => 
    val tokens = line.split(" ").toSet
    Seq("is", "work").forall(tokens.contains)
}

Upvotes: 2

Related Questions