Issue scraping HTML from gmail

Question

I am trying to scrape HTML from my gmail email. I am using the email package, and beautiful soup to get the data. For some reason it seems like when i am going over the email directly from the company that sends it to me, the HTML is returned like this:

PCFET0NUWVBFIGh0bWwgUFVCTElDICItLy93M2MvL2R0ZCB4aHRtbCAxLjAgdHJhbnNpdGlvbmFs
Ly9lbiIgImh0dHA6Ly93d3cudzMub3JnL3RyL3hodG1sMS9kdGQveGh0bWwxLXRyYW5zaXRpb25h
bC5kdGQiPjxodG1sIHN0eWxlPSJtYXJnaW46IDA7cGFkZGluZzogMDtmb250LWZhbWlseTogJ0hl
bHZldGljYSBOZXVlJywgJ0hlbHZldGljYScsIEhlbHZldGljYSwgQXJpYWwsIHNhbnMtc2VyaWY7
Ym94LXNpemluZzogYm9yZGVyLWJveCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0
bWwiPjxoZWFkIHN0eWxlPSJtYXJnaW46IDA7cGFkZGluZzogMDtmb250LWZhbWlseTogJ0hlbHZl
dGljYSBOZXVlJywgJ0hlbHZldGljYScsIEhlbHZldGljYSwgQXJpYWwsIHNhbnMtc2VyaWY7Ym94
LXNpemluZzogYm9yZGVyLWJveCI+CiAgICA8bWV0YSBzdHlsZT0ibWFyZ2luOiAwO3BhZGRpbmc6
IDA7Zm9udC1mYW1pbHk6ICdIZWx2ZXRpY2EgTmV1ZScsICdIZWx2ZXRpY2EnLCBIZWx2ZXRpY2Es
IEFyaWFsLCBzYW5zLXNlcmlmO2JveC1zaXppbmc6IGJvcmRlci1ib3giIGh0dHAtZXF1aXY9IkNv
bnRlbnQtVHlwZSIgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PVVURi04IiAvPgogICAgPHRp

This is the code I am running to get the data above.

def grab_email(most_recent):
    result2, email_data = mail.uid('fetch', most_recent, '(RFC822)')
    raw_email = email_data[0][1].decode('utf-8')
    email_message = email.message_from_string(raw_email)
    return email_message

def get_data(email_message):
    for part in email_message.walk():
        content_type = part.get_content_type()
        if 'html' in content_type:
            html_ = part.get_payload()
            soup = BeautifulSoup(html_, 'lxml')
            text = soup.get_text()
            print(text)

When the email comes from the original source, my code returns the first section above with random numbers and letters. But if i forward the email to myself, so the code goes over it a second time, it works perfectly and extracts the information exactly like it is supposed to. Any help figuring this out would be awesome!

Issue scraping HTML from gmail

Answers (1)

Related Questions