Qrom
Qrom

Reputation: 497

GMail API decoding messages from everywhere

I am working with the GMail API in Python to retrieve mails written in french and I'm actually having a problem with accents.

I retrieve the messages with this :

 message = service.users().messages().get(userId="me", id=i, format="raw").execute()

All I want is to get the body of the mail so I start with this :

base64.urlsafe_b64decode(message['raw'].encode('ASCII'))

For some mails, it works, I retrieve all the mail data including french text like :

"Cette semaine, vous vous êtes servis du module de révision 0 fois"

For some others, I get quoted-print encoding, like this :

"Salut, =E7a farte?"

Quoted-print encoding is no issue as I have built a simple decoding function using the quopri module. The main problem here is that the last sentence is wrong for quoted-print encoding, the encoded character is ç and should be encoded like this :

"Salut, =C3=A7a farte?"

So with the wrong encoded sentence, I end-up with this kind of stuff :

Salut, �a farte?

I suspect the origin being the different mailing client, my first exemple is a message sent from Gmail client to an Outlook address and the second example being the opposite; An outlook message to a Gmail address.

My question here would be, is there a way to handle decoding for any possible scenario?

Upvotes: 0

Views: 2346

Answers (2)

dorian
dorian

Reputation: 6282

The problem is that while quopri correctly translates the mail body from 7-bit data to 8-bit data, the encoding that you then use to convert this bytestring into a unicode string is not the right one. In your example, it appears to be ISO-8859-1:

In [1]: import quopri

In [2]: quopri.decodestring('Salut, =E7a farte?').decode('iso-8859-1')
Out[2]: 'Salut, ça farte?'

Usually you should be able to get the correct encoding using the Content-Type header. This is how it looks like in a mail that uses quoted-printable UTF-8 encoding:

Content-Type: text/plain;charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Upvotes: 2

Tung Thanh
Tung Thanh

Reputation: 36

Try this:

message = service.users().messages().get(userId='me', id=i).execute()
content = message['payload']['body']['data']
print(base64.b64decode(content).decode('utf-8'))

This will get the content of email.

Upvotes: 0

Related Questions