satyagraha
satyagraha

Reputation: 665

Inelegant structural repetition in parser combinators

While parsing some complex text, where I need to split out regular expression definitions for reuse and readability reasons, I seem to be often ending up with Scala code of this general structure (pn - a regex pattern, vn a variable):

val cp1 = p1 ~ p2 ~ p3 ~ p4 ~ p5 ~ p6 ^^
          case { dummy1 ~ v2 ~ dummy3 ~ v4 ~ dummy5 ~ v6 => ACaseClass(v2, v4, v6) }

The obvious issue is the readability and maintainability of the code as new patterns need to be inserted because of the separation of the useful matches (vn) from the placeholder ones (dummyn).

So, is there a neater way to express the intent? Could I use _ instead for every dummyn?

In the SNOBOL language, one could write (pat . var) or (pat $ var) which would assign the result of the match to the variable; similarly, in the latest regex syntax we have named capture groups (?P<name>pat). The intent is clearly to keep the match capture variable close to the pattern.

So, what I would like to write is something along the general lines of:

val cp1 = p1 ~ ( p2 $$ v2 )  ~ p3 ~ ( p4 $$ v4 ) ~ p5 ~ ( p6 $$ v6 ) $=>
          ACaseClass(v2, v4, v6)

Obviously I am assuming some sort of new operators $$ and $=> which enable this simpler syntax.

Conceivably macros could help, but they are rather beyond my abilities at present. Any input welcome!

Upvotes: 0

Views: 97

Answers (1)

ziggystar
ziggystar

Reputation: 28670

Why didn't you try using _? It turns out that it works. Also you can use ~> and <~ to discard parts of your pattern, although you'll need to use parenthesis if you want to discard inner parts.

object SimpleScala extends JavaTokenParsers {

  def test = "(" ~> wholeNumber ~ ("," ~> wholeNumber <~ ",") ~ wholeNumber <~ ")" ^^ 
    { case i1 ~ i2 ~ i3 => (i1,i2,i3) }

  def test2 = "(" ~ wholeNumber ~ "," ~ wholeNumber ~ ")" ^^ 
    { case _ ~ i1 ~ _ ~i2 ~ _ => (i1,i2) }

  def main(args: Array[String]){
    println(parseAll(test,"(42,34,5)"))
    println(parseAll(test2,"(42,345)"))
  }
}

Upvotes: 2

Related Questions