aioobe
aioobe

Reputation: 421020

Bug in Apache Commons StringEscapeUtil?

Just started using Apache Commons StringEscapeUtils.

According to http://www.w3schools.com/tags/ref_entities.asp, Ö should correspond to Ö. However,

System.out.println(StringEscapeUtils.unescapeHtml4("Ö"));

prints

×

Is this a bug? Or what am I missing?

Upvotes: 2

Views: 1341

Answers (3)

Michael Konietzka
Michael Konietzka

Reputation: 5499

I guess EntityArrays.java from the lang3 repository is buggy:

{"\u00D6", "Õ"}, // � - uppercase O, tilde
{"\u00D7", "Ö"}, // � - uppercase O, umlaut
{"\u00D8", "×"}, // multiplication sign 

It seems, that some values are shifted by one row. It must be:

 {"\u00D6", "Ö"}, // � - uppercase O, umlaut
 {"\u00D7", "×"}, // multiplication sign 

because Ö is 00D6 according to LATIN CAPITAL LETTER O WITH DIAERESIS

and x is "\u00D7"

Upvotes: 6

cherouvim
cherouvim

Reputation: 31903

version 2.5 StringEscapeUtils.unescapeHtml prints Ö

version 3.0-beta StringEscapeUtils.unescapeHtml3 and StringEscapeUtils.unescapeHtml4 print ×

Generally I'd use the latest stable version (currently 2.5). Looks like a bug but I couldn't find anything useful in https://issues.apache.org/jira/browse/LANG

Upvotes: 2

Grodriguez
Grodriguez

Reputation: 21995

Perhaps your console cannot show the Ö character. Check the system property file.encoding to see what the default console encoding is.

If your console supports UTF-8 you can try to start the JVM with -Dfile.encoding=utf-8, or you can do this from your application:

System.setOut(new PrintStream(System.out, true, "utf-8"));

If the console does not support UTF-8, I suggest to try to write that to a file instead, using UTF-8 encoding, then open the file with a text editor that can handle UTF-8.

If all of this doesn't work, then it is probably a bug in StringEscapeUtils.

Upvotes: 0

Related Questions