Python: What is this encoding and how to decode?

Question

I have a lot of strings from mail bodies, that print as such:

=C3=A9

This should be 'é' for example.

What exactly is this encoding and how to decode it?

I'm using python 3.5

EDIT:

I managed to get the body of the mail properly encoded by applying:

quopri.decodestring(sometext).decode('utf-8')

However I still struggle to get the FROM , TO, SUBJECT, etc... parts get right.

This is how I construct the e-mails:

import imaplib
import email
import quopri


mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('mail@gmail.com', '*******')
mail.list()

mail.select('"[Gmail]/All Mail"') 



typ, data = mail.search(None, 'SUBJECT', '"{}"'.format('123456'))

data[0].split()

print(data[0].split())

for e_mail in data[0].split():
    typ, data = mail.fetch('{}'.format(e_mail.decode()),'(RFC822)')
    raw_mail = data[0][1]
    email_message = email.message_from_bytes(raw_mail)
    if email_message.is_multipart():
        for part in email_message.walk():
            if part.get_content_type() == 'text/plain':
                if part.get_content_type() == 'text/plain':
                    body = part.get_payload()
                    to = email_message['To']

                    utf = quopri.decodestring(to)

                    text = utf.decode('utf-8')
                    print(text)
.
.
.

I still got this: =?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=

Peter Petocz · Accepted Answer

This solved it:

from email.header import decode_header
def mail_header_decoder(self,header):
        if header != None:
            mail_header_decoded = decode_header(header)
            l=[]  
            header_new=[]
            for header_part in mail_header_decoded: 
                l.append(header_part[1])

            if all(item == None for item in l):
                # print(header)
                return header
            else:
                for header_part in mail_header_decoded:
                    header_new.append(header_part[0].decode())
                header_new = ''.join(header_new) # convert list to string
                # print(header_new)
                return header_new

Python: What is this encoding and how to decode?

Answers (2)

Related Questions