Reputation: 2542
I need to create a IdMessage with Unicode subject (eg "本語 - test")
I have tried setting it using
Msg.Subject := UTF8Encode(subject);
where subject is a WideString containing the text above but when I look at the encoded subject (by saving the Message to file) it looks like this:
Subject: =?UTF-8?Q?=C3=A6=C5=93=C2=AC=C3=A8=C2=AA=C5=BE?= - test
instead of
Subject: =?UTF-8?Q?=E6=0C=AC=E8=AA=9E?= - test
and Outlook displays it as "本語 - test"
Any pointers as to where I am going wrong?
Delphi 2006 (pre-unicode), Indy 10 (fairly recent from source)
Upvotes: 1
Views: 3812
Reputation: 595827
In pre-Unicode versions of Delphi, where everything is based on AnsiString
, the value you assign to the TIdMessage.Subject
property (and any other AnsiString
property of TIdMessage
, for that matter) MUST be encoded using the OS default character encoding. You are encoding it to UTF-8 instead, which will not work. This is because TIdMessage
will first decode the Subject
value to Unicode using the OS default encoding, then MIME-encode the Unicode data using the encoding parameters provided by the TIdMessage.OnInitializeISO
event, or defaults if no event handler is assigned (in this case, those parameters are CharSet=UTF-8
and HeaderEncoding=QuotedPrintable
). TIdMessage
has no mechanism to allow you to specify the encoding used for any AnsiString
data you assign to it. So the only possibility to send a value of '本語 - test'
with the Subject
property is to assign your source WideString
as-is to the property and let the RTL convert the data to AnsiString
using the OS default encoding:
Msg.Subject := subject;
However, if the OS does not support the Unicode characters being used, there will be data lost. There is no avoiding that in this scenario.
The alternative is to set the Subject
property to a blank string and then use the TIdMessage.ExtraHeaders
property instead so that you can provide your own header value that will be put into the email as-is. Using this approach, you can call Indy's EncodeHeader()
function directly. In pre-Unicode versions of Delphi, it has an optional ASrcEncoding
parameter that defaults to the OS default encoding (TIdMessage
does not currently provide a value for that parameter when encoding headers):
uses
..., IdCoderHeader;
Msg.Subject := '';
Msg.ExtraHeaders.Values['Subject'] := EncodeHeader(UTF8Encode(subject), '', 'Q', 'UTF-8', IndyTextEncoding_UTF8);
This way, EncodeHeader()
will be able to avoid a redundant conversion because it can detect that the source and target character encodings are both UTF-8, and thus just MIME-encode the source UTF-8 data as-is. Worse case, even if it did not detect the character encodings were the same, it would simply decode the source data to Unicode using UTF-8 and then re-encode it back to UTF-8. Those are lossless conversions, so no data is lost.
And FYI, the correct encoding for the Unicode characters you have shown would be:
Subject: =?UTF-8?Q?=E6=9C=AC=E8=AA=9E?= - test
Not
Subject: =?UTF-8?Q?=E6=0C=AC=E8=AA=9E?= - test
As you have shown. Notice the second encoded octet is 9C
instead of 0C
.
Upvotes: 6