Michael
Michael

Reputation: 42100

How to implement "unescape" in Scala?

This is a follow-up to my previous question

Thanks to the answers I realized that the escape function is actually a flatMap with argument f:Char => Seq[Char] to map escaped characters to escaping sequences (see the answers).

Now I wonder how to implement unescape as a reverse operation to escape. I guess tt should be a reverse to flatMap with argument f:Seq[Char] => Char. Does it make sense ? How would you suggest implement unescape ?

Upvotes: 0

Views: 1580

Answers (2)

Erik Kaplun
Erik Kaplun

Reputation: 38227

This seems to be a follow-up to my own answer to the question whose follow-up this question is... use scala.xml.Utility.unescape:

val sb = new StringBuilder
scala.xml.Utility.unescape("amp", sb)
println(sb.toString) // prints &

or if you just want to unescape once and throw away the StringBuilder instance:

scala.xml.Utility.unescape("amp", new StringBuilder).toString // returns "&"

This just parses individual escapes; you'll have to build a parser of entire XML strings around it yourself—the accepted answer seems to provide that bit but fails to not reinvent the scala.xml.Utility wheel— or use something from scala.xml instead.

Upvotes: 1

Alexey Romanov
Alexey Romanov

Reputation: 170805

I guess tt should be a reverse to flatMap with a function f:Seq[Char] => Char. Does it make sense ?

Not really. What should your inverse function f:Seq[Char] => Char return on "abc"? It should apply to any sequence of characters and return a single character. You could try using PartialFunction[Seq[Char], Char] instead, but you'll run into other problems. Do you apply it to every subsequence of your input?

The more general solution would be to use foldLeft with the accumulator type containing both the built-up part of the result and the escaping sequence, something like (untested):

def unescape(str: String) = {
  val result = str.foldLeft[(String, Option[String])](("", None)) { case ((acc, escapedAcc), c) => 
    (c, escapedAcc) match {
      case ('&', None) =>
        (acc, Some(""))
      case (_, None) =>
        (acc + c, None)
      case ('&', Some(_)) =>
        throw new IllegalArgumentException("nested escape sequences")
      case (';', Some(escapedAcc1)) => 
        (acc + unescapeMap(escapedAcc1), None)
      case (_,  Some(escapedAcc1)) =>
        (acc, Some(escapedAcc1 + c))
    }
  }

  result match {
    case (escaped, None) =>
      escaped
    case (_, Some(_)) => 
      throw new IllegalArgumentException("unfinished escape sequence")
  }
}

val unescapeMap = Map("amp" -> "&", "lt" -> "<", ...)

(It's much more efficient to use StringBuilders for the accumulators, but this is simpler to understand.)

But for this specific case you could just split the string on &, then split each part except first on ;, and get the parts you want this way.

Upvotes: 2

Related Questions