Reputation: 4554
I have the following code that loads a null terminated multi-byte string from a buffer. It nominally interprets the data as UTF-8 but, if that conversion fails, it then interprets the data as ISO-8859-1. Here is the code:
@Override
public String format(String date_format, boolean use_locale, int precision)
{
String rtn = null;
int len = 0;
for(int i = 0; i < max_len; ++i)
{
if(storage[storage_offset + i] != 0)
++len;
else
break;
}
try
{
rtn = new String(storage, storage_offset, len, "UTF-8");
}
catch(UnsupportedEncodingException e1)
{
try
{
rtn = new String(storage, storage_offset, len, "ISO-8859-1");
}
catch(UnsupportedEncodingException e2)
{ }
}
return rtn;
}
My intention is that, if the string decode fails for UTF-8, we can fall back. This is dependent upon the UnsupportedEncodingException being thrown. I have run a test of this code that passes extended characters (codes greater than 128) without the expected UTF-8 pattern. What I have found is that the exception is NOT being thrown and unknown glyphs are being shown for the converted string. My question is whether there has been any change to the standard library implementation that would cause the exception NOT to be thrown?
Upvotes: 0
Views: 547
Reputation: 4887
You could test if the charset is available.
To get available charsets use:
SortedMap<String, Charset> availableCharsets = Charset.availableCharsets();
for (Map.Entry<String, Charset> entrySet : availableCharsets.entrySet()) {
String key = entrySet.getKey();
Charset value = entrySet.getValue();
System.out.println("key: " + key + " value: " + value.name());
}
System.out.println("The default Charset is: " + Charset.defaultCharset().name());
Upvotes: 0
Reputation: 11028
According to the docs for that String constructor, UnsupportedEncodingException is only thrown if the specified charsetName is unknown.
The behavior of this constructor when the given bytes are not valid in the given charset is unspecified. The CharsetDecoder class should be used when more control over the decoding process is required.
Upvotes: 2
Reputation: 43391
The UnsupportedEncodingException
is thrown if the charset itself is unsupported (that is, you specify a charset and the system doesn't recognize the name) -- not if the bytes don't encode correctly. Note that the corresponding constructor that takes a java.nio.charset.Charset
does not throw that exception (since there's no name to map to a Charset
, and thus no possibility that the mapping isn't there).
The docs for String(byte[], int, int, String)
specify the behavior (namely, that it's unspecified :) ) and suggest the fix:
The behavior of this constructor when the given bytes are not valid in the given charset is unspecified. The
CharsetDecoder
class should be used when more control over the decoding process is required.
Upvotes: 2