Reputation: 3
In Scala, how can I transform:
<p>here we have a <a href="http://www.scala-lang.org/api/current/index.html">link</a> example.</p>
to
here we have a \url{http://www.scala-lang.org/api/current/index.html}{link} example.
where <p></p>
maps to "nothing", and <a href"_">_</>
maps to \url{_}{_}
Upvotes: 0
Views: 758
Reputation: 478
More generic way is using parsers, like scala's parser combinator, or available ones of java. if the file is well-formed xml, the way to process xml is ok too.
Upvotes: 0
Reputation: 67888
As an alternative, if you need more transformations*, you can start with this. It will also work with nested <a/>
tags, whatever sense this may make.
There’s some need of escape handling in the code. E.g. some characters are escaped in XML which are not escaped in Latex and the other way round. Feel free to add this.
import xml._
val input = <p>And now try it on a <a href="link1">text</a> with <a href="link2">two urls</a></p>
def mkURL(meta: MetaData, text: String) = {
val url = meta.asAttrMap.get("href")
"\\url{%s}{%s}".format(url getOrElse "", text)
}
def transform(xhtml: NodeSeq): String = {
xhtml.map { node =>
node match {
case Node("p", _, ch@_*) => transform(ch)
case Node("a", meta, ch@_*) => mkURL(meta, transform(ch))
case x => x.toString
}
} mkString
}
println(transform(input))
// And now try it on a \url{link1}{text} with \url{link2}{two urls}
[*] Adding support for \emph
would be something like
case Node("em", _, ch@_*) => transform(ch).mkString("\\emph{", "", "}")
Upvotes: 3
Reputation: 20627
Define regexps:
scala> val link = """<a href="(.+)">(.+)</a>""".r
link: scala.util.matching.Regex = <a href="(.+)">(.+)</a>
scala> val paragraph = """<p>(.+)</p>""".r
paragraph: scala.util.matching.Regex = <p>(.+)</p>
scala> val text = """<p>here we have a <a href="http://www.scala-lang.org/api/current/index.html">link</a> example.</p>"""
text: java.lang.String = <p>here we have a <a href="http://www.scala-lang.org/api/current/index.html">link</a> example.</p>
Apply them to the input:
scala> val modifiedText = paragraph.replaceAllIn(text, {matched => val paragraph(content) = matched; content})
modifiedText: String = here we have a <a href="http://www.scala-lang.org/api/current/index.html">link</a> example.
scala> link.replaceAllIn(modifiedText, {matched => val link(href, title) = matched; "\\\\url{%s}{%s}" format(href, title)})
res11: String = here we have a \url{http://www.scala-lang.org/api/current/index.html}{link} example.
Upvotes: -1