David Balažic
David Balažic

Reputation: 1474

Java mail sender address gets non-ASCII chars removed

MimeMessage mm;
mm.setFrom(new InternetAddress("\"Test\u0161\u0160\" <[email protected]>"));

The above code in a web application deployed to WebLogic corrupts the non-ascii chars (they are 'š' and 'Š'), but only on some server environments. The network capture of sending the mail to the SMTP server shows: (the message headers part)

   0x0060:  0a46 726f 6d3a 2022 5465 7374 2061 2060  .From:."Test.a.`
   0x0070:  203c 7465 7374 4065 7861 6d70 6c65 2e6e  .<[email protected]
   0x0080:  6574 3e0d 0a                             et>..

So the code \u0161 was "converted" to 20 61 and \u0160 was "converted" to 20 60.

Where to start looking? I suspect some environment settings, although it shouldn't matter, as Java internally uses Unicode and all data is correct until leaving the JVM. Also talking to a SMTP server should use some established conventions to encode text.

The actual string is read from a properties file where it is defined as:

mailFrom="Test\u0161\u0160" <[email protected]>

It is read correctly as proven by a debug log output.

I added some more debug output (run after calling mm.setFrom(fromAddress)):

logger.info("set:{}, get:{}",  fromAddress, mm.getFrom()[0].toString());

On a "good" system (Sun Java 1.6.0_45, tomcat 6.0.44) it prints:

set:"TestšŠ" <[email protected]>, get:=?UTF-8?Q?Test=C5=A1=C5=A0?= <[email protected]>

On "bad" system (WebLogic 10.3.5, JRockit 1.6.0_26) it prints:

set:"TestšŠ" <[email protected]>, get:"TestšŠ" <[email protected]>

So it seems it forgets to properly encode the Unicode chars. Bug in JRE?

More info:

It turns out WL uses javax.mail_1.1.0.0_1-4-1.jar while tomcat uses geronimo-javamail_1.4_spec-1.7.1.jar

Upvotes: 1

Views: 751

Answers (1)

Bill Shannon
Bill Shannon

Reputation: 29971

The InternetAddress constructor that takes a single String expects that string to be in proper MIME format, which means all ASCII with any non-ASCII characters encoded.

If instead you use the InternetAddress constructor that takes separate email address and person name strings, it will encode the non-ASCII characters in the personal name for you.

Unfortunately, I see that you're storing the addresses in a properties file in this non-MIME format. If you can't change the format of the data in the properties file, your best bet may be to parse the data to extract the different fields, then use the two-string InternetAddress constructor:

    InternetAddress ia =
        new InternetAddress("\"Test\u0161\u0160\" <[email protected]>");
    InternetAddress ia2 =
        new InternetAddress(ia.getAddress(), ia.getPersonal());
    System.out.println(ia2);

which gives:

=?UTF-8?Q?Test=C5=A1=C5=A0?= <[email protected]>

This happens to work because the current implementation of the MIME-string InternetAddress constructor is not rejecting illegal non-ASCII characters.

Upvotes: 1

Related Questions