Reputation: 7605
I have the following file called weblogs.txt:
56.38.234.188 – 99788 "GET /KBDOC-00157.html HTTP/1.0" …
56.38.234.188 – 99788 "GET /theme.css HTTP/1.0" …
203.146.17.59 – 25254 "GET /KBDOC-00230.html HTTP/1.0" …
221.78.60.155 – 45402 "GET /titanic_4000_sales.html HTTP/1.0" …
65.187.255.81 – 14242 "GET /KBDOC-00107.html HTTP/1.0"
And I would like to filter all lines which the word: "KBDOC"
This is what I have written so far, but without result:
val patt = "KBDOC".r
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.contains(patt))
kbreqs.foreach(println)
But this print nothing. What am I doing wrong?
Expected result:
56.38.234.188 – 99788 "GET /KBDOC-00157.html HTTP/1.0" …
203.146.17.59 – 25254 "GET /KBDOC-00230.html HTTP/1.0" …
65.187.255.81 – 14242 "GET /KBDOC-00107.html HTTP/1.0"
Edit based on Solutions [Solved]:
val patt: String = "KBDOC"
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.contains(patt)).collect()
kbreqs.foreach(println)
Upvotes: 2
Views: 3385
Reputation: 13985
First of all you are mixing the regex with a method which expects a String as an argument.
Both contains
and matches
methods of String
takes another String
as a parameter which is then converted to a regex and finally matched against the String itself.
So, you can use any of the following
val s = """56.38.234.188 – 99788 "GET /KBDOC-00157.html HTTP/1.0""""
val pattern1: String = "KBDOC"
s.contains(pattern1)
// true
// Or,
val pattern2: String = ".*KBDOC.*"
s.contains(pattern1)
// true
// Or,
val pattern3: String = ".*KBDOC.*"
s.matches(pattern3)
// true
// but this will be false
val pattern4: String = "KBDOC"
s.matches(pattern4)
// false
Upvotes: 2
Reputation: 5210
You could just use a String
instead of a Regex
:
val patt: String = "KBDOC"
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.contains(patt))
Or if you'd like to use Regex
:
val patt: String = ".*KBDOC.*"
val kbreqs = sc.textFile("weblogs.txt").filter(line => line.matches(patt))
Another version:
val patt: Regex = "KBDOC".r
val kbreqs = sc.textFile("weblogs.txt").filter(line => patt.findAllIn(line).length > 0)
Upvotes: 2