Devanshu Misra
Devanshu Misra

Reputation: 813

Can someone compute a regular expression for apache access log files for Scala?

I am using the following regular expression in Scala

val Pattern = """^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) (\d+)""".r

val res = Pattern.findFirstMatchIn(logFile)

Yet it is giving me the following error:

: Cannot parse log line: 80-219-148-207.dclient.hispeed.ch - - [07/Mar/2004:19:47:36 -0800] "OPTIONS * HTTP/1.0" 200 -

Upvotes: 0

Views: 761

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

The issue is that your regex expected the last parameter to be numeric (\d+ - one or more digits), but it came as a - (unknown, undefined). The previous subpatterns worked OK because \S+ (1 or more non-whitespaces) matches a hyphen.

So, either replace the last \d+ with \S+ or use alternation (\d+|-). This latter approach can be extended to all the pattern parts like this:

^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}|-) (\d+|-)
                                                                       ^^      ^^

See the regex demo.

Upvotes: 1

Related Questions