Reputation: 1357
I'm learning to write a simple parser-combinator. I'm writing the rules from bottom up and write unit-tests to verify as I go. However, I'm blocked at using repsep() with whitespace as the separator.
object MyParser extends RegexParsers {
lazy val listVal:Parser[List[String]]=elem('{')<~repsep("""\d+""".r,"""\s+""".r)~>elem('}')
}
The rule was simplified to illustrate the problem. When I feed the parser with "{1 2 3}", it always complains that it doesn't match:
[1.4] failure: `}' expected but 2 found
I'm wondering what's the correct way of writing a rule as I described?
Thanks
Upvotes: 3
Views: 329
Reputation: 26486
By default, RegexParsers
-derived parsers skip whitespace before attempting to match any terminal symbol. Unless your whitespace interpretation is unusual, you can just work with that. If the particular character (sequences) you wish to treat as ignored whitespace is something other than the default (\s+
), you can override the projected val whiteSpace: Regex = ...
value in your RegexParsers
parser. If you do not what any such whitespace skipping to occur, override def skipWhitespace = false
.
Edit: So yes, changing this:
repsep("""\d+""".r,"""\s+""".r)
to this:
rep("""\d+""".r)
and leaving everything else defined in RegexParsers
unchanged should do what you want.
By the way, the common use of repsep
is for things like comma-separated lists where you need to ensure the commas are there but don't need to keep them in the resulting parse tree (or AST).
Upvotes: 6