Reputation: 665
While parsing some complex text, where I need to split out regular expression definitions for reuse and readability reasons, I seem to be often ending up with Scala code of this general structure (pn
- a regex pattern, vn
a variable):
val cp1 = p1 ~ p2 ~ p3 ~ p4 ~ p5 ~ p6 ^^
case { dummy1 ~ v2 ~ dummy3 ~ v4 ~ dummy5 ~ v6 => ACaseClass(v2, v4, v6) }
The obvious issue is the readability and maintainability of the code as new patterns need to be inserted because of the separation of the useful matches (vn
) from the placeholder ones (dummyn
).
So, is there a neater way to express the intent? Could I use _
instead for every dummyn
?
In the SNOBOL language, one could write (pat . var)
or (pat $ var)
which would assign the result of the match to the variable; similarly, in the latest regex syntax we have named capture groups (?P<name>pat)
. The intent is clearly to keep the match capture variable close to the pattern.
So, what I would like to write is something along the general lines of:
val cp1 = p1 ~ ( p2 $$ v2 ) ~ p3 ~ ( p4 $$ v4 ) ~ p5 ~ ( p6 $$ v6 ) $=>
ACaseClass(v2, v4, v6)
Obviously I am assuming some sort of new operators $$
and $=>
which enable this simpler syntax.
Conceivably macros could help, but they are rather beyond my abilities at present. Any input welcome!
Upvotes: 0
Views: 97
Reputation: 28670
Why didn't you try using _
? It turns out that it works. Also you can use ~>
and <~
to discard parts of your pattern, although you'll need to use parenthesis if you want to discard inner parts.
object SimpleScala extends JavaTokenParsers {
def test = "(" ~> wholeNumber ~ ("," ~> wholeNumber <~ ",") ~ wholeNumber <~ ")" ^^
{ case i1 ~ i2 ~ i3 => (i1,i2,i3) }
def test2 = "(" ~ wholeNumber ~ "," ~ wholeNumber ~ ")" ^^
{ case _ ~ i1 ~ _ ~i2 ~ _ => (i1,i2) }
def main(args: Array[String]){
println(parseAll(test,"(42,34,5)"))
println(parseAll(test2,"(42,345)"))
}
}
Upvotes: 2