Robert Niestroj
Robert Niestroj

Reputation: 16131

Java - Detect different encoding in String

i'm generating HTML email in Java and sending it through Apache Commons Email. My mails are sent in UTF-8 and work OK in MS Outlook and GMail but i have in issue with polish email provider Wirtualna Polska [ http://wp.pl/ ]. Their online email client is complaining that "Detected different encodings in the email content". How can i detect in java which chars or words in a string have different encodings than the other?

enter image description here

If it does matter - the email in an HTML email and has 4 images embedded. enter image description here

Finally when i got my email ready i do this to force UTF8:

return org.apache.commons.codec.binary.StringUtils.newStringUtf8(mail.getBytes(StandardCharsets.UTF_8));

But it does not help.

Upvotes: 0

Views: 395

Answers (1)

Peter Lamby
Peter Lamby

Reputation: 295

Java Strings are always encoded as UTF-16. That is Unicode where each codepoint(more or less an character) is at least 16 Bits aka 2 Bytes long.

You need to specify the encoding if you write an String to an output or read from one.

Most write or read methods feature an optional parameter to specify the encoding. If not specified most likely the default encoding of your OS is used.

When you are writing mostly ASCII Characters the output may look like valid UTF-8 even if its using some different encoding like ASCII or CP-1252. That may be the reason some mailproviders accept your mails as valid UTF-8

Upvotes: 1

Related Questions