Ignacio Alorre
Ignacio Alorre

Reputation: 7605

Scala - Filter lines in a document if a string/word is present

I have the following file called weblogs.txt:

56.38.234.188 – 99788 "GET /KBDOC-00157.html HTTP/1.0" …
56.38.234.188 – 99788 "GET /theme.css HTTP/1.0" …
203.146.17.59 – 25254 "GET /KBDOC-00230.html HTTP/1.0" …
221.78.60.155 – 45402 "GET /titanic_4000_sales.html HTTP/1.0" …
65.187.255.81 – 14242 "GET /KBDOC-00107.html HTTP/1.0"

And I would like to filter all lines which the word: "KBDOC"

This is what I have written so far, but without result:

val patt = "KBDOC".r
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.contains(patt))
kbreqs.foreach(println)

But this print nothing. What am I doing wrong?

Expected result:

56.38.234.188 – 99788 "GET /KBDOC-00157.html HTTP/1.0" …
203.146.17.59 – 25254 "GET /KBDOC-00230.html HTTP/1.0" …
65.187.255.81 – 14242 "GET /KBDOC-00107.html HTTP/1.0"

Edit based on Solutions [Solved]:

val patt: String = "KBDOC"
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.contains(patt)).collect()
kbreqs.foreach(println)

Upvotes: 2

Views: 3385

Answers (2)

sarveshseri
sarveshseri

Reputation: 13985

First of all you are mixing the regex with a method which expects a String as an argument.

Both contains and matches methods of String takes another String as a parameter which is then converted to a regex and finally matched against the String itself.

So, you can use any of the following

val s = """56.38.234.188 – 99788 "GET /KBDOC-00157.html HTTP/1.0""""

val pattern1: String = "KBDOC"
s.contains(pattern1)
// true

// Or,
val pattern2: String = ".*KBDOC.*"
s.contains(pattern1)
// true

// Or,
val pattern3: String = ".*KBDOC.*"
s.matches(pattern3)
// true

// but this will be false
val pattern4: String = "KBDOC"
s.matches(pattern4)
// false

Upvotes: 2

jamborta
jamborta

Reputation: 5210

You could just use a String instead of a Regex:

val patt: String = "KBDOC"
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.contains(patt))

Or if you'd like to use Regex:

val patt: String = ".*KBDOC.*"
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.matches(patt))

Another version:

val patt: Regex = "KBDOC".r
val kbreqs = sc.textFile("weblogs.txt").filter(line => patt.findAllIn(line).length > 0)

Upvotes: 2

Related Questions