XDS
XDS

Reputation: 4188

IMAP search command with UTF-8 charset in C#

C# Imap search command with special characters like á,é

I am trying to implement the logic mentioned in the above post in C# to achieve non-ascii based searches in gmail. After logging in successfully to imap.gmail.com I am having the following transaction with the server:

(C -> S) Encoding.Default.GetBytes("A4 UID SEARCH CHARSET UTF-8 TEXT {4}\r\n");
(C <- S) "+ go ahead\r\n"
(C -> S) Encoding.Default.GetBytes("αβγδ\r\n");
(C <- S) "* SEARCH 72\r\nA2 OK SEARCH completed (Success)"

However the email denoted by the response of the server is completely irrelevant to the search term I provided. This only happens when using non-ascii characters in the keywords and I believe I have something wrong with the encoding.

I have also tried using Encoding.Ascii but then I get search results that are even more off target.

What is the proper way to send the string literal: "αβγδ\r\n"

Upvotes: 1

Views: 2325

Answers (1)

nosid
nosid

Reputation: 50044

For the search term, you are using a so-called literal. The length of the literal has to be specified in octets. That's not the case in your example. The string "αβγδ" encoded in UTF-8 consists of more than four octets.

So, you should encode the search term before sending the length to the server.

I don't know much about C#. I make an example with Python:

search_term = 'Grüße'
encoded_search_term = search_term.encode('UTF-8')
length = str(len(encoded_search_term)).encode('ascii')

send(b'. UID SEARCH CHARSET UTF-8 TEXT {' + length + b'}\r\n')
read_until(br'^\+ .*$')

send(encoded_search_term + b'\r\n')
read_until(br'^\. OK .*$')

With this code, the search command returns the UIDs of the emails with the text "Grüße":

C: b'. UID SEARCH CHARSET UTF-8 TEXT {7}\r\n'
S: b'+ Ready for literal data\r\n'
C: b'Gr\xc3\xbc\xc3\x9fe\r\n'
S: b'* SEARCH 1 3 4\r\n'
S: b'. OK UID SEARCH completed\r\n'

If I use the length in characters (len(search_term)) instead of the encoded length in octets (len(encoded_search_term)), the IMAP server reports an error:

C: b'. UID SEARCH CHARSET UTF-8 TEXT {5}\r\n'
S: b'+ Ready for literal data\r\n'
C: b'Gr\xc3\xbc\xc3\x9fe\r\n'
S: b'. BAD expected end of data instead of "\\237e"\r\n'

Note, I didn't use Gmail for my tests.

Upvotes: 4

Related Questions