Reputation: 19769
I'm reading the messages from an email account by using JavaMail 1.4.1 (I've upgraded to 1.4.5 version but with the same problem), but I'm having issues with the encoding of the content:
POP3Message pop3message;
...
Object contentObject = pop3message.getContent();
...
String contentType = pop3message.getContentType();
String content = contentObject.toString();
Some messages are read properly, but others have strange characters because of a not suitable encoding. I have realized it doesn't work for a specific content type.
It works well if the contentType is any of these:
text/plain; charset=ISO-8859-1
text/plain;
charset="iso-8859-1"text/plain;
charset="ISO-8859-1";
format="flowed"text/plain; charset=windows-1252
but it doesn't if it is:
- text/plain;
charset="utf-8"
for this contentType (UTF-8 one) if I try to get the encoding (pop3message.getEncoding()) I get
quoted-printable
For the latter encoding I get for example in the debugger in the String value (in the same way as I see it in the database after persisting the object):
Ubicación (instead of Ubicación)
But if I open the email with the email client in a browser it can be read without any problem, and it's a normal message (no attachments, just text), so the message seems to be OK.
Any idea about how to solve this issue?
Thanks.
UPDATE This is the piece of code I've added to try the function getUTF8Content() given by jlordo
POP3Message pop3message = (POP3Message) message;
String uid = pop3folder.getUID(message);
//START JUST FOR TESTING PURPOSES
if(uid.trim().equals("1401")){
Object utfContent = pop3message.getContent();
System.out.println(utfContent.getClass().getName()); // it is of type String
//System.out.println(utfContent); // if not commmented it prints the content of one of the emails I'm having problems with.
System.out.println(pop3message.getEncoding()); //prints: quoted-printable
System.out.println(pop3message.getContentType()); //prints: text/plain; charset="utf-8"
String utfContentString = getUTF8Content(utfContent); // throws java.lang.ClassCastException: java.lang.String cannot be cast to javax.mail.util.SharedByteArrayInputStream
System.out.println(utfContentString);
}
//END TEST CODE
Upvotes: 3
Views: 8104
Reputation: 442
First of all you must add headers according to UTF-8 encoding this way:
...
MimeMessage msg = new MimeMessage(session);
msg.setHeader("Content-Type", "text/html; charset=UTF-8");
msg.setHeader("Content-Transfer-Encoding", "8bit");
msg.setFrom(new InternetAddress(doConversion(from)));
msg.setRecipients(javax.mail.Message.RecipientType.TO, address);
msg.setSubject(asunto, "UTF-8");
MimeBodyPart mbp1 = new MimeBodyPart();
mbp1.setContent(text, "text/html; charset=UTF-8");
Multipart mp = new MimeMultipart();
mp.addBodyPart(mbp1);
...
But for 'from' header, i use the following method to convert characters:
public String doConversion(String original) {
if(original == null) return null;
String converted = original.replaceAll("á", "\u00c3\u00a1");
converted = converted.replaceAll("Á", "\u00c3\u0081");
converted = converted.replaceAll("é", "\u00c3\u00a9");
converted = converted.replaceAll("É", "\u00c3\u0089");
converted = converted.replaceAll("í", "\u00c3\u00ad");
converted = converted.replaceAll("Í", "\u00c3\u008d");
converted = converted.replaceAll("ó", "\u00c3\u00b3");
converted = converted.replaceAll("Ó", "\u00c3\u0093");
converted = converted.replaceAll("ú", "\u00c3\u00ba");
converted = converted.replaceAll("Ú", "\u00c3\u009a");
converted = converted.replaceAll("ñ", "\u00c3\u00b1");
converted = converted.replaceAll("Ñ", "\u00c3\u0091");
converted = converted.replaceAll("€", "\u00c2\u0080");
converted = converted.replaceAll("¿", "\u00c2\u00bf");
converted = converted.replaceAll("ª", "\u00c2\u00aa");
converted = converted.replaceAll("º", "\u00c2\u00b0");
return converted;
}
You can see the corresponding UTF-8 hex encoding in UTF at http://www.fileformat.info/info/charset/UTF-8/list.htm if you need to include some other characters.
Upvotes: 0
Reputation: 1861
What worked for me was that I called getContentType()
and I would check if the String contains a "utf" in it (defining the charset used as one of UTF).
If yes, I would treat the content differently in this case.
private String encodeCorrectly(InputStream is) {
java.util.Scanner s = new java.util.Scanner(is, StandardCharsets.UTF_8.toString()).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
(a modification of a IS to String converter from this answer on SO)
The important part here is using the correct Charset. This solved the issue for me.
Upvotes: 0
Reputation: 37813
try this and let me know if it works:
if ( *check if utf 8 here* ) {
content = getUTF8Content(contentObject);
}
// TODO take care of UnsupportedEncodingException,
// IOException and ClassCastException
public static String getUTF8Content(Object contentObject) {
// possible ClassCastException
SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) contentObject;
// possible UnsupportedEncodingException
InputStreamReader isr = new InputStreamReader(sbais, Charset.forName("UTF-8"));
int charsRead = 0;
StringBuilder content = new StringBuilder();
int bufferSize = 1024;
char[] buffer = new char[bufferSize];
// possible IOException
while ((charsRead = isr.read(buffer)) != -1) {
content.append(Arrays.copyOf(buffer, charsRead));
}
return content.toString();
}
BTW, is JavaMail 1.4.1 a requirement? Up to date version is 1.4.5.
Upvotes: 0
Reputation: 29961
How are you detecting that these messages have "strange characters"? Are you displaying the data somewhere? It's possible that whatever method you're using to display the data isn't handling Unicode characters properly.
The first step is to determine whether the problem is that you're getting the wrong characters, or that the correct characters are being displayed incorrectly. You can examine the Unicode values of each character in the data (e.g., in the String returned from the getContent method) to make sure each character has the correct Unicode value. If it does, the problem is with the method you're using to display the characters.
Upvotes: 1