Reputation: 4188
C# Imap search command with special characters like á,é
I am trying to implement the logic mentioned in the above post in C# to achieve non-ascii based searches in gmail. After logging in successfully to imap.gmail.com I am having the following transaction with the server:
(C -> S) Encoding.Default.GetBytes("A4 UID SEARCH CHARSET UTF-8 TEXT {4}\r\n");
(C <- S) "+ go ahead\r\n"
(C -> S) Encoding.Default.GetBytes("αβγδ\r\n");
(C <- S) "* SEARCH 72\r\nA2 OK SEARCH completed (Success)"
However the email denoted by the response of the server is completely irrelevant to the search term I provided. This only happens when using non-ascii characters in the keywords and I believe I have something wrong with the encoding.
I have also tried using Encoding.Ascii
but then I get search results that are even more off target.
What is the proper way to send the string literal: "αβγδ\r\n"
Upvotes: 1
Views: 2325
Reputation: 50044
For the search term, you are using a so-called literal. The length of the literal has to be specified in octets. That's not the case in your example. The string "αβγδ" encoded in UTF-8 consists of more than four octets.
So, you should encode the search term before sending the length to the server.
I don't know much about C#. I make an example with Python:
search_term = 'Grüße'
encoded_search_term = search_term.encode('UTF-8')
length = str(len(encoded_search_term)).encode('ascii')
send(b'. UID SEARCH CHARSET UTF-8 TEXT {' + length + b'}\r\n')
read_until(br'^\+ .*$')
send(encoded_search_term + b'\r\n')
read_until(br'^\. OK .*$')
With this code, the search command returns the UIDs of the emails with the text "Grüße":
C: b'. UID SEARCH CHARSET UTF-8 TEXT {7}\r\n'
S: b'+ Ready for literal data\r\n'
C: b'Gr\xc3\xbc\xc3\x9fe\r\n'
S: b'* SEARCH 1 3 4\r\n'
S: b'. OK UID SEARCH completed\r\n'
If I use the length in characters (len(search_term)
) instead of the encoded length in octets (len(encoded_search_term)
), the IMAP server reports an error:
C: b'. UID SEARCH CHARSET UTF-8 TEXT {5}\r\n'
S: b'+ Ready for literal data\r\n'
C: b'Gr\xc3\xbc\xc3\x9fe\r\n'
S: b'. BAD expected end of data instead of "\\237e"\r\n'
Note, I didn't use Gmail for my tests.
Upvotes: 4