Kevin Meredith
Kevin Meredith

Reputation: 41919

Setting Whitespace as Delimiter in JavaTokenParsers

Extending JavaTokenParsers, I have the following:

class Foo extends JavaTokenParsers { 
  lazy val check = id ~ action ~ obj

  lazy val id     = "FOO" | "BAR"
  lazy val action = "GET" | "SET"
  lazy val obj = "BAZ" | "BIZ"
}

I had assumed that whitespace would act as a delimiter. In other words, I became confused when check parsed the following expression successfully: FOO GETBAZ.

val result = parseAll(check, "FOO GETBAZ")
println(result.get)

Result

((FOO~GET)~BAZ)

How can I use whitespace as a delimiter, i.e. the above wouldn't parse successfully since GETBAZ does not match either of action's GET or SET?

Upvotes: 1

Views: 259

Answers (2)

Will Fitzgerald
Will Fitzgerald

Reputation: 1382

Override the def skipWhitespace method:

object Foo extends JavaTokenParsers { 
  lazy val check = id ~ action ~ obj
  lazy val id     = "FOO" | "BAR"
  lazy val action = "GET" | "SET"
  lazy val obj =  "BAZ" | "BIZ"
  override def skipWhitespace() = false
}

See:

scala> Foo.parseAll(Foo.check, "FOOGETBAZ").isEmpty
res0: Boolean = false

scala> Foo.parseAll(Foo.check, "FOOGET BAZ").isEmpty
res1: Boolean = true

Upvotes: 0

Travis Brown
Travis Brown

Reputation: 139048

JavaTokenParser adds some methods to RegexParsers, but it doesn't change the behavior of literal, which will match its argument without worrying about what's around it.

The skipWhitespace setting isn't going to help you, either, since it only specifies whether whitespace will be ignored—not whether it's required.

You have a couple of options. One would be to use regular expressions with word boundaries:

class Foo extends JavaTokenParsers {
  def word(s: String): Parser[String] = regex(s"\\b$s\\b".r)

  lazy val check = id ~ action ~ obj    

  val id     = word("FOO") | word("BAR")
  val action = word("GET") | word("SET")
  val obj    = word("BAZ") | word("BIZ")
}

Or ident:

class Foo extends JavaTokenParsers {
  def word(s: String): Parser[String] = ident.filter(_ == s)

  lazy val check = id ~ action ~ obj    

  val id     = word("FOO") | word("BAR")
  val action = word("GET") | word("SET")
  val obj    = word("BAZ") | word("BIZ")
}

Or you could manually add whitespace parsers between each of your items.

I'd probably go with the \b solution, but it's largely a matter of taste and preferences.

Upvotes: 1

Related Questions