Elias Schoof
Elias Schoof

Reputation: 2046

Python imaplib: Display non-ASCII characters correctly

I am using Python 3.5 and imaplib to fetch an e-mail from GMail and print its body. The body contains non-ASCII characters. These are 'encoded' in a strange way and I cannot find out how to fix this.

import email
import imaplib

c = imaplib.IMAP4_SSL('imap.gmail.com')
c.login('[email protected]', 'password')

c.select('Inbox')
_, data = c.fetch(b'12345', '(RFC822)')

mail = data[0][1]
message = email.message_from_bytes(mail)
payload = message.get_payload()

body = mail[0].as_string()
print(body)

Gives

>> ... Mit freundlichen Gr=C3=BC=C3=9Fen ...

instead of the desired

>> ... Mit freundlichen Grüßen ...

It looks to me like this is not an issue of encoding but one of conversion. But how do I tell Python to convert the characters correctly? Is there a more convenient library?

Upvotes: 2

Views: 1240

Answers (1)

snakecharmerb
snakecharmerb

Reputation: 55640

The text is encoded with quoted-printable encoding, which is a way to encode non-ascii characters in ascii text. You can decode it using python's quopri module.

>>> import quopri
>>> bs = b'Gr=C3=BC=C3=9Fen'

>>> # Decode quoted-printable to raw bytes.
>>> utf8 = quopri.decodestring(bs)

>>> # Decode bytes to text.
>>> s = utf8.decode('utf-8')
>>> print(s)
Grüßen

You may find that quoted-printable is the value of the email's content-transfer-encoding header.

Upvotes: 6

Related Questions