user972946
user972946

Reputation:

How to revert XML escaped characters (XML unescape)?

I would like a Scala function to return String & when given input &, similarly for all other XML escaped characters.

I have attempted to use xml.Unparsed, maybe in a wrong way, it does not give my the desired output:

scala> val amp = '&'
amp: Char = &

scala> <a>{amp}</a>.toString
res0: String = <a>&amp;</a>

scala> import scala.xml._
import scala.xml._

scala> <a>{amp}</a>.child(0)
res1: scala.xml.Node = &amp;

scala> xml.Unparsed(<a>{amp}</a>.child(0).toString)
res2: scala.xml.Unparsed = &amp;

I have also attempted to use xml.Utility.unescape, but it does not give any output at all:

scala> val sb = new StringBuilder
sb: StringBuilder = 

scala> xml.Utility.unescape("&amp;", sb)
res0: StringBuilder = null

scala> sb.toString
res1: String = ""

scala> 

Upvotes: 5

Views: 3180

Answers (2)

steph
steph

Reputation: 1

I have not found anything in scala.xml.Utility... I did it quick and dirty with this:

def unescape(text: String): String = {
  def recUnescape(textList: List[Char], acc: String, escapeFlag: Boolean): String = {
    textList match {
      case Nil => acc
      case '&' :: tail => recUnescape(tail, acc, true)
      case ';' :: tail if (escapeFlag) => recUnescape(tail, acc, false)
      case 'a' :: 'm' :: 'p' :: tail if (escapeFlag) => recUnescape(tail, acc + "&", true)
      case 'q' :: 'u' :: 'o' :: 't' :: tail if (escapeFlag) => recUnescape(tail, acc + "\"", true)
      case 'l' :: 't' :: tail if (escapeFlag) => recUnescape(tail, acc + "<", true)
      case 'g' :: 't' :: tail if (escapeFlag) => recUnescape(tail, acc + ">", true)
      case x :: tail => recUnescape(tail, acc + x, true)
      case _ => acc
    }
  }
  recUnescape(text.toList, "", false)
}

Upvotes: 0

themel
themel

Reputation: 8895

If you just want to get unescaped strings out of XML objects, text is your friend:

scala> val el = <a>{amp}</a>
el: scala.xml.Elem = <a>&amp;</a>
scala> el.child(0)
res4: scala.xml.Node = &amp;
scala> el.child(0).text
res5: String = &

The implementation of this is in scala.xml.EntityRef. Getting a function that does precisely what you're asking for is not super straightforward since the library doesn't do text parsing (it's done by the Java SAX parser) and you thus first need to turn your "&amp;" into an EntityRef so that you can call that, which seems like a massive amount of waste given how simple the implementation of text is.

Upvotes: 6

Related Questions