user2449761
user2449761

Reputation: 1229

How to handle UTF8 string from Pythons imaplib

Python imaplib sometimes returns strings that looks like this:

=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=

What is the name for this notation?

How can I decode (or should I say encode?) it to UTF8?

Upvotes: 5

Views: 3316

Answers (2)

Jatin Mahajan
Jatin Mahajan

Reputation: 147

You can directly use the bytes decoder instead , here is an example:

result, data = imapSession.uid('search', None, "ALL") #search and return uids latest_email_uid = data[0].split()[-1] #data[] is a list, using split() to separate them by space and getting the latest one by [-1]

result, data = imapSession.uid('fetch', latest_email_uid, '(BODY.PEEK[])')

raw_email = data[0][1].decode("utf-8") #using utf-8 decoder`

Upvotes: 1

Uriel
Uriel

Reputation: 16184

In short:

>>> from email.header import decode_header
>>> msg = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0][0].decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'

My computer doesn't show the polish characters, but they should appear in yours (locales etc.)


Explained:

Use the email.header decoder:

>>> from email.header import decode_header
>>> value = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')
>>> value
[(b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie', 'utf-8')]

That will return a list with the decoded header, usually containing one tuple with the decoded message and the encoding detected (sometimes more than one pair).

>>> msg, encoding = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0]
>>> msg
b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie'
>>> encoding
'utf-8'

And finally, if you want msg as a normal utf-8 string, use the bytes decode method:

>>> msg = msg.decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'

Upvotes: 3

Related Questions