Reputation: 1229
Python imaplib sometimes returns strings that looks like this:
=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=
What is the name for this notation?
How can I decode (or should I say encode?) it to UTF8?
Upvotes: 5
Views: 3316
Reputation: 147
You can directly use the bytes decoder instead , here is an example:
result, data = imapSession.uid('search', None, "ALL")
#search and return uids
latest_email_uid = data[0].split()[-1]
#data[] is a list, using split() to separate them by space and getting the latest one by [-1]
result, data = imapSession.uid('fetch', latest_email_uid, '(BODY.PEEK[])')
raw_email = data[0][1].decode("utf-8")
#using utf-8 decoder`
Upvotes: 1
Reputation: 16184
In short:
>>> from email.header import decode_header
>>> msg = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0][0].decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'
My computer doesn't show the polish characters, but they should appear in yours (locales etc.)
Explained:
Use the email.header
decoder:
>>> from email.header import decode_header
>>> value = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')
>>> value
[(b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie', 'utf-8')]
That will return a list with the decoded header, usually containing one tuple with the decoded message and the encoding detected (sometimes more than one pair).
>>> msg, encoding = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0]
>>> msg
b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie'
>>> encoding
'utf-8'
And finally, if you want msg
as a normal utf-8 string, use the bytes decode
method:
>>> msg = msg.decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'
Upvotes: 3