Sathish Kumar
Sathish Kumar

Reputation: 321

javamail throws java.io.UnsupportedEncodingException: unknown-8bit

There were some emails that I try to read using javamail lib. When the email contains the MIME header (Content-Type: text/plain; charset="unknown-8bit"), I get this error: java.io.UnsupportedEncodingException: unknown-8bit

Any ideas why is this happening?

Upvotes: 1

Views: 2011

Answers (1)

Bill Shannon
Bill Shannon

Reputation: 29961

Because "unknown-8bit" is not a known charset name. This is explained in the JavaMail FAQ, along with alternatives for handling this problem. I've copied the answer here but note that this may become out of date. Please be sure to search the JavaMail FAQ for any other JavaMail problems you might have.

Q: Why do I get the UnsupportedEncodingException when I invoke getContent() on a bodypart that contains text data?

A: Textual bodyparts (i.e., bodyparts whose type is "text/plain", "text/html", or "text/xml") return Unicode String objects when getContent() is used. Typically, such bodyparts internally hold their textual data in some non Unicode charset. JavaMail (through the corresponding DataContentHandler) attempts to convert that data into a Unicode string. The underlying JDK's charset converters are used to do this. If the JDK does not support a particular charset, then the UnsupportedEncodingException is thrown. In this case, you can use the getInputStream() method to retrieve the content as a stream of bytes. For example:

String s;
if (part.isMimeType("text/plain")) {
    try {
        s = part.getContent();
    } catch (UnsupportedEncodingException uex) {
        InputStream is = part.getInputStream();
        /*
         * Read the input stream into a byte array.
         * Choose a charset in some heuristic manner, use
         * that charset in the java.lang.String constructor
         * to convert the byte array into a String.
         */
         s = convert_to_string(is);
    } catch (Exception ex) {
        // Handle other exceptions appropriately
    }
}

There are some commonly used charsets that the JDK does not yet support. You can find support for some of these additional charsets in the JCharset package at http://www.freeutils.net/source/jcharset/.

You can also add an alias for an existing charset already supported by the JDK so that it will be known by an additional name. You can create a charset provider for the "bad" charset name that simply redirects to an existing charset provider; see the following code. Create an appropriate CharsetProvider subclass and include it along with the META-INF/services file and the JDK will find it. Obviously you could get significantly more clever and redirect all unknown charsets to "us-ascii", for instance.

==> UnknownCharsetProvider.java <==
import java.nio.charset.*;
import java.nio.charset.spi.*;
import java.util.*;

public class UnknownCharsetProvider extends CharsetProvider {
     private static final String badCharset = "x-unknown";
     private static final String goodCharset = "iso-8859-1";

     public Charset charsetForName(String charset) {
         if (charset.equalsIgnoreCase(badCharset))
             return Charset.forName(goodCharset);
         return null;
     }

     public Iterator<Charset> charsets() {
         return Collections.emptyIterator();
     }
}

==> META-INF/services/java.nio.charset.spi.CharsetProvider <==
UnknownCharsetProvider

Upvotes: 5

Related Questions