ohadshay
ohadshay

Reputation: 285

issues with parsing email using python with "imaplib" library, html lines char limit, and additional non unicode chara

i'm using python 3, and want to validate emails sent to my inbox im using imaplib, i've managed to get the email content, however, the mail is unreadable and kind of corrupted ( variable html123 in code) after i'm fetching the mail, and getting the content using : mail_body = email.message_from_string(str(data[1][0][1], 'utf-8'))

this is the original mail i see in mailbox:

dear blabla, We’ve added new tasks to your account. Please log in to your account to review and.....

this is the mail i get in python | |


dear blabla, We=E2=80=99ve added new tasks to your account. Plea= se log in to your account....

so 3 issues in this example, i have much more in the real mail: 1 -the ' was replaced with =E2=80=99 2- the word please cut at end of line, with = 3 -all the signs\char || --- you see above

this is the relevant part in code:

 data = self.mail_conn.fetch(str(any_email_id), f'({fetch_protocol})')
 mail_body = email.message_from_string(str(data[1][0][1], 'utf-8'))
 html123 = mail_body.get_payload()
 x1 = (html2text.html2text(html123))

Upvotes: 0

Views: 64

Answers (1)

vinzenz
vinzenz

Reputation: 689

The data you get from imaplib is in "quoted-printable" encoding. https://en.wikipedia.org/wiki/Quoted-printable

To decode you can use the builtin quopri module

import quopri
quopri.decodestring("we=E2=80=99ve").decode() # -> we've 

Upvotes: 1

Related Questions