abeger
abeger

Reputation: 6866

Smart quotes in a MimeMessage not showing up correctly in Outlook

Our application takes text from a web form and sends it via email to an appropriate user. However, when someone copy/pastes in the infamous "smart quotes" or other special characters from Word, things get hairy.

The user types in

he said “hello” to me—isn’t that nice?

But when the message appears in Outlook 2003, it comes out like this:

he said hello to meisnt that nice?

The code for this was:

Session session = Session.getInstance(props, new MailAuthenticator());
Message msg = new MimeMessage(session);

//removed setting to/from addresses to simplify

msg.setSubject(subject);
msg.setText(text);
msg.setHeader("X-Mailer", MailSender.class.getName());
msg.setSentDate(new Date());
Transport.send(msg);

After a little research, I figured this was probably a character encoding issue and attempted to move things to UTF-8. So, I updated the code thusly:

Session session = Session.getInstance(props, new MailAuthenticator());
MimeMessage msg = new MimeMessage(session);

//removed setting to/from addresses to simplify

msg.setHeader("X-Mailer", MailSender.class.getName());
msg.addHeader("Content-Type", "text/plain");
msg.addHeader("charset", "UTF-8");
msg.setSentDate(new Date());
Transport.send(msg);

This got me closer, but no cigar:

he said “hello” to me—isn’t that nice?

I can't imagine this is an uncommon problem--what have I missed?

Upvotes: 1

Views: 3017

Answers (4)

dave wanta
dave wanta

Reputation: 7214

IIRC, MS Office quotes are found characterset "iso-8859-1".

Upvotes: 0

Piskvor left the building
Piskvor left the building

Reputation: 92752

Is the page with your form also using UTF-8, or a different charset? If you don't specify the webpage charset, the format of data coming to your script is anyone's guess.


Edit: the charset in the message should be set like this:

msg.addHeader("Content-Type", "text/plain; charset=UTF-8");

since charset is not a separate header, but an option to Content-type

Upvotes: 1

McDowell
McDowell

Reputation: 108869

I would check that the data being received from the browser is correct - dump the Unicode code points and check them against the charts:

  public static void printCodepoints(char[] s) {
    for (int i = 0; i < s.length; i++) {
      int codePoint = Character.isHighSurrogate(s[i]) ? Character
          .toCodePoint(s[i], s[++i])
          : s[i];
      System.out.println(Integer.toHexString(codePoint));
    }
  }

For example, the symbol DOUBLE LEFT QUOTATION MARK () is character U+201C.

It has been a long time since I used the mail API, but the MimeMessage.html.setText(text, charset) method might be worth a look. The documentation on setText(String) says it uses the default character set (probably windows-1252 if you're using English/Latin-1 Windows).

Upvotes: 0

Daniel A. White
Daniel A. White

Reputation: 190907

Why don't you replace the nice quotes with regular prime quotes?

Upvotes: 0

Related Questions