Saveyy
Saveyy

Reputation: 75

How to split a string on multiple regular expressions while keeping the splitting characters

I'm writing a lexer/scanner for the first time, and have run into a problem splitting the input-string. Example:

val result = "func add(Num x, Num y) = x+y;".split(???) 
result == Array("func", "add", "(", "Num", "x", ",", "Num", "y", ")", "=", "x", "+", "y", ";")

But the problem is that I can't simply split on whitespace characters, doing so wouldn't separate add from ( for example.

Any help with this?

Upvotes: 2

Views: 102

Answers (2)

Joan
Joan

Reputation: 4300

Look into http://www.scala-lang.org/api/rc/index.html#scala.util.parsing.combinator.RegexParsers

Here is an unfinished example:

import scala.util.parsing.combinator.RegexParsers

trait Element

case class Function(name: String,
                    params:Map[String, String],
                    expression:Seq[String]) extends Element

case class Class(name: String,
                 params: Map[String,String],
                 body: Seq[String]) extends Element

object LanguageParser extends RegexParsers {

  val name: Parser[String] = ".*".r

  val `type`: Parser[String] = ???

  val parameters: Parser[Map[String,String]] = "(" ~> (`type` ~ name).* <~")" ^^ {
    case t => (t map {
      case a ~ b => a -> b
    }).toMap
  }

  val expression: Parser[Seq[String]] = ???

  val function: Parser[Function] =
    "func " ~> name ~ parameters ~ "="~ expression ^^ {
      case name ~  params ~ _ ~ expr => Function(name, params, expr)
    }
  
  val `class`: Parser[Class] =
    "class " ~> name ~ parameters ~ "{" ~ expression ~ "}" ^^ {
      case name ~  params ~ _ ~ expr ~_ => Class(name, params, expr)
    }

  val topLevelParsers: Parser[Seq[Element]] =
    function |
      `class` |
      value |
      ifelse

  def parse(s: String): Seq[Element] = parseAll(topLevelParsers, s.trim) getOrElse
    (throw newIllegalArgumentException("Could not parse the given string: " + s.trim))

  def parseAll(s: String):Seq[Element] =
    s split ";" flatMap parse
}

Cheers

Upvotes: 0

SamWhan
SamWhan

Reputation: 8332

This will give you a bunch of empty items that your EE will have to handle, but adding word boundry - \b - should do it.

Check example at regex101.

I.e. ...split('\s|\b') (or /\s|\b/).

Regards

Upvotes: 1

Related Questions