Michael Z
Michael Z

Reputation: 4013

IMAP SEARCH CHARSET with ISO-8859-1

I can't understand what encoding approach uses Tunderbird while searching on IMAP server with command IMAP SEARCH CHARSET

I've tried to search Russian word "привет" and this was mapped to "?@825B", i.e.

A001 SEARCH CHARSET ISO-8859-1 BODY "?@825B"

How that happen? I'm sure this is correct as I've used sniffer for catch this and the Dovecot server correctly found the mail with "привет" word. The ISO-8859-1 encoding hasn't Russian glyphs at all! So how it was converted?

For example, "привет" (written as Unicode characters) gives "??????" for ISO-8859-1 encoding on my machine or here http://www.motobit.com/util/charset-codepage-conversion.asp

Upvotes: 2

Views: 1615

Answers (1)

jstedfast
jstedfast

Reputation: 38653

The way that Thunderbird is getting this value is by downcasting a (16-bit?) unicode character to a byte.

For example, in C# (which uses UTF-16 internally for its char and string types), this would get the result you are seeing:

const string text = "привет";

var buffer = new char[text.Length];
for (int i = 0; i < text.Length; i++)
    buffer[i] = (char) ((byte) text[i]);

var result = new string (buffer);

How Thunderbird handles surrogate pairs is anyone's guess based on what is known from the question. It might treat the surrogate pair as 2 separate characters (like my above code would) or it might combine them into a 32-bit unicode character and downcast that to a byte.

Upvotes: 0

Related Questions