mcspiral
mcspiral

Reputation: 157

Java - Escape HTML characters excluding some characters

I'm trying to escape special characters in my HTML code except the characters used in tags (<, >, ", ', and &). I tried searching for existing libraries(i.e StringEscapeUtils) for this but all of them also escapes <, >, ", ', and & - characters I don't want to escape.

For example, if I have

<div>— £</div>

I want it to be converted to

<div>&mdash; &pound;</div>

I DON'T want it to be

&lt;div&gt;&mdash; &pound;&lt;/div&gt;

Is there any way to do this in Java?

Upvotes: 1

Views: 1167

Answers (1)

laune
laune

Reputation: 31290

Add this class to your code. (The package is necessary since the code uses some package-scoped names.)


package org.apache.commons.lang;

public class Fix extends Entities {
    public static final Entities HTML04;
    static {
        HTML04 = new Entities();
        HTML04.addEntities(ISO8859_1_ARRAY);
        HTML04.addEntities(HTML40_ARRAY);
    }
    public static String escapeHtml(String str) {
        if (str == null) {
            return null;
        }
        return HTML04.escape(str);
    }
}

It is now possible to escape HTML without <, >, &, " using

String html = "<div> & — £ \"</div>
Fix.escapeHtml(html)

Output:

<div> & &mdash; &pound; "</div>

Upvotes: 1

Related Questions