Reputation: 365
I am trying to implement an ISO 8601 date/time format parser and experiencing some troubles with optional parts of time. I have constructed a simplified example of my problem:
class ISO8601 extends RegexParsers {
val hour = s"[0-9]{2}".r ^^ {_.toInt}
val minute = s"[0-9]{2}".r ^^ {_.toInt}
val timeSep = ":"
val test = (hour ~ opt(timeSep ~> minute) |
hour ~ opt(minute)) ^^ {
case hh ~ mmOpt =>
val mm = mmOpt.getOrElse(0)
(hh, mm, 0, 0)
}
}
What I wanted to do is to allow the following time formats:
My parser successfully parses "23" and "23:30" but rejects to parse "2330":
isoRes: iso.ParseResult[(Int, Int, Int, Int)] = [1.3] failure: string matching regex `\z' expected but `3' found
2330
Should not parser backtrack on that failure and try to match the second option (after "|")?
Upvotes: 0
Views: 974
Reputation: 93872
The problem is the opt()
parser. First, I'm assuming you're calling it like this:
parseAll(ISO8601.test, new CharSequenceReader("2330"))
So what happens? parseAll
will try to parse the all input in the reader, i.e until it returns no more characters.
So the test
parser is used, it tries the first alternative and parses "23". Then there is no separator, so the opt()
parser will return None
and the first alternative succeeds. So there is no need to check the second alternative. Then there is still the characters 3
and 0
in the reader, but the parser expected to be at the end of the input! That's why you get a failure.
Now try with:
println(ISO8601.parseAll(ISO8601.rep(ISO8601.test), new CharSequenceReader("2330")))
it outputs:
[1.5] parsed: List((23,0,0,0), (30,0,0,0))
so you see that the first alternative has been used 2 times.
So how can you fix it? One alternative would be to make the minutes optional and the separator in the minutes optional too.
def test = hour ~ opt(opt(timeSep) ~> minute) map {
case h ~ None => (h, 0, 0, 0)
case h ~ Some(mm) => (h, mm, 0, 0)
}
Running it successively with "23", "2330", "23:30", you get:
[1.3] parsed: (23,0,0,0)
[1.6] parsed: (23,30,0,0)
[1.5] parsed: (23,30,0,0)
By the way, you should add some checks in the hour
and minute
parsers otherwise "9999"
is a valid input.
Upvotes: 3
Reputation: 365
Well, I think I have figured it out myself.
The reason is that the first alternative matches consuming only "23", so the test term is "23" and the rest of imput is "30". Then the parser expects the end of input but sees remaining "30".
class ISO8601 extends RegexParsers {
val hour = s"[0-9]{2}".r ^^ {_.toInt}
val minute = s"[0-9]{2}".r ^^ {_.toInt}
val timeSep = ":"
val test = (hour ~ (timeSep ~> minute) |
hour ~ success(0) |
hour ~ minute) ^^ { case hh ~ mm => (hh, mm, 0, 0) }
}
However, if I add seconds and milliseconds, the term seems quite inelegant:
val time = (hour ~ (timeSep ~> minute) ~ (timeSep ~> second) ~ (msSep ~> ms) |
hour ~ (timeSep ~> minute) ~ (timeSep ~> second) ~ success(0) |
hour ~ (timeSep ~> minute) ~ success(0) ~ success(0) |
hour ~ success(0) ~ success(0) ~ success(0) |
hour ~ minute ~ second ~ (msSep ~> ms) |
hour ~ minute ~ second ~ (msSep ~> success(0)) |
hour ~ minute ~ second ~ success(0) |
hour ~ minute ~ success(0) ~ success(0)
) ^^ {
case hh ~ mm ~ ss ~ sss =>
(hh, mm, ss, sss)
}
And I do not see a way to fix it.
Upvotes: 0
Reputation: 11518
I'd do it this way:
val time = "^([0-9]{2}):?([0-9]{0,2})$".r
def parse(str: String) = str match {
case time(h, m) => (h, if (m == "") 0 else m, 0, 0)
}
parse("12")
parse("13:21")
parse("1456")
Upvotes: 0