Romain Jouin
Romain Jouin

Reputation: 4818

Python email payload decoding

I know this question has been asked thousand of time, but I am near to a nervous break so I can't help but to ask for help.

I have an email with french accents caractères. The sentence is :

"céline : Berlin Annette : 0633'.

The email package of python changes

':' on '=3A'

"é" on "=E9".

How to get back to the accent ?? and to the "=" sign ?

I tried several things looking through the net :

getting the payload :

>>> z = msg.get_payload()
>>> z
'C=E9line =3A Berlin Annette =3A 0633'
>>> infos(z)
(<type 'str'>, '  'C=E9line =3A Berlin Annette =3A 0633')

decoding it by its charset:

>>> z = msg.get_payload().decode(msg.get_content_charset())
>>> z
u'  C=E9line =3A Berlin Annette =3A 0633'
>>> infos(z)
(<type 'unicode'>, u'  'C=E9line =3A Berlin Annette =3A 0633')

or encoding it in utf_8 after decoding:

>>> z = msg.get_payload().decode(msg.get_content_charset()).encode('utf-8')
>>> z
  'C=E9line =3A Berlin Annette =3A 0633'
>>> infos(z)
(<type 'str'>,   'C=E9line =3A Berlin Annette =3A 0633')

I also tried urllib:

urllib.unquote(z)
'C=E9line =3A 00493039746784 Berlin Annette =3A 0633'

nothing seems to work :(

Upvotes: 5

Views: 8552

Answers (1)

falsetru
falsetru

Reputation: 368904

You can use quopri.decodestring to decode the string.

>>> quopri.decodestring('C=E9line =3A 00493039746784 Berlin Annette =3A 0633')
'C\xe9line : 00493039746784 Berlin Annette : 0633'

If you pass decode=True to Message.get_payload, it will do above for you:

msg.get_payload(decode=True)

Upvotes: 9

Related Questions