Reputation: 557
My application is trying to embed an html document into an XML document.
val xml =
<document>
<id> { getId } </id>
<content>
{ getContent }
</content>
</document>
getId is a simple function to return a new sequence number. The issue is on getContent:
def getContent = {
val wrapped = "<wrap>"+article.content+"</wrap>"
XML.loadString(wrapped).child
}
As you may see, article.content return a String that stored the real-world HTML document. The Scala.xml.XML.loadString function would parse it into XML and return a list of child and embeded into the xml val correctly.
However, this is working when only the html is valid, e.g. <body>Hello world</body>
In some of the article, it may appear: <body><strong>Hello world</body>
which lacking a closing tag of <strong>
elem. (Yes, I can't just blame the user!)
In this case, it will throw an exception on this parsing and stop the application.
Is there any way I can either bypass the validation or simply embed the HTML as string within the XML document without parsing?
Please shed some light on this situation. Any suggestions are welcomed.
Upvotes: 0
Views: 320
Reputation: 49705
Both JSoup and TagSoup (amongst others) are suitable for passing HTML that isn't also well-formatted XML.
You'll have to decide which is best for your own use-case.
Upvotes: 2