Reputation: 1241
the code piece below in scala
using Jsoup
allows me to clean a string from any html tags except for those explicitly in the white list:
val whiteList = Whitelist.none().addTags(
"b", "br", "ul", "ol", "li", "em", "h4", "h5", "hr", "pre", "sub", "sup"
)
Jsoup.clean("some unsafe text", whiteList)
the process indiscriminately strips all css styling and element attributes from the tags inside text which is desired for the general case. But what I want is for the process to retain the direction
css property or possibly the dir
attribute on the block elements of the white list.
I don't have a problem with an answer written in java.
Upvotes: 1
Views: 262
Reputation: 1241
I solved it by passing the unsafe text to a custom recursive method like this:
val whiteList = List(
"b", "br", "ul", "ol", "li", "em", "h4", "h5", "hr", "pre", "sub", "sup"
)
def clean(raw: String): String = {
def traverseAndClean(elem: Element): Unit = {
if (!whiteList.contains(elem.tagName())) {
elem.remove()
} else {
elem.attributes().forEach { attr =>
val key = attr.getKey
if (key != "dir") elem.removeAttr(key)
}
elem.children().iterator().forEachRemaining(traverseAndClean)
}
}
val doc = Jsoup.parseBodyFragment(raw)
doc.body().children().iterator().forEachRemaining(traverseAndClean)
doc.body().html()
}
clean("my unsafe text")
Upvotes: 1