Nathaniel Flath
Nathaniel Flath

Reputation: 16045

How to escape HTML special characters in Java?

Is there a way to convert a string to a string that will display properly in a web document? For example, changing the string

"<Hello>"

To

"&lt;Hello&gt;"

Upvotes: 16

Views: 66702

Answers (5)

Plcode
Plcode

Reputation: 233

Better do it yourself, if you know the logic behind - it is easy:

 public class ConvertToHTMLcode {
        public static void main(String[] args) throws IOException {
          String specialSymbols = "ễ%ß Straße";
          System.out.println(convertToHTMLCodes(specialSymbols)); //&#7877;%&#223;
   }

   public static String convertToHTMLCodes(String str) throws IOException {
      StringBuilder sb = new StringBuilder();
      int len = str.length();
      for(int i = 0; i < len; ++i) {
          char c = str.charAt(i);
         if (c > 127) {
            sb.append("&#");
            sb.append(Integer.toString(c, 10));
            sb.append(";");
        } else {
            sb.append(c);
        }
     }
       return sb.toString();
   }
}

Upvotes: 0

Amber
Amber

Reputation: 527378

StringEscapeUtils has functions designed exactly for this:

http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/StringEscapeUtils.html

Upvotes: 38

Borislav Gizdov
Borislav Gizdov

Reputation: 1960

HTMLEntities is an Open Source Java class that contains a collection of static methods (htmlentities, unhtmlentities, ...) to convert special and extended characters into HTML entitities and vice versa.

http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities

Upvotes: 1

Sorantis
Sorantis

Reputation: 14722

public static String stringToHTMLString(String string) {
    StringBuffer sb = new StringBuffer(string.length());
    // true if last char was blank
    boolean lastWasBlankChar = false;
    int len = string.length();
    char c;

    for (int i = 0; i < len; i++)
        {
        c = string.charAt(i);
        if (c == ' ') {
            // blank gets extra work,
            // this solves the problem you get if you replace all
            // blanks with &nbsp;, if you do that you loss 
            // word breaking
            if (lastWasBlankChar) {
                lastWasBlankChar = false;
                sb.append("&nbsp;");
                }
            else {
                lastWasBlankChar = true;
                sb.append(' ');
                }
            }
        else {
            lastWasBlankChar = false;
            //
            // HTML Special Chars
            if (c == '"')
                sb.append("&quot;");
            else if (c == '&')
                sb.append("&amp;");
            else if (c == '<')
                sb.append("&lt;");
            else if (c == '>')
                sb.append("&gt;");
            else if (c == '\n')
                // Handle Newline
                sb.append("&lt;br/&gt;");
            else {
                int ci = 0xffff & c;
                if (ci < 160 )
                    // nothing special only 7 Bit
                    sb.append(c);
                else {
                    // Not 7 Bit use the unicode system
                    sb.append("&#");
                    sb.append(new Integer(ci).toString());
                    sb.append(';');
                    }
                }
            }
        }
    return sb.toString();
}

Upvotes: 2

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143334

That's usually called "HTML escaping". I'm not aware of anything in the standard libraries for doing this (though you can approximate it by using XML escaping). There are lots of third-party libraries that can do this, however. StringEscapeUtils from org.apache.commons.lang has a escapeHtml method that can do this.

Upvotes: 3

Related Questions