Melody Anoni
Melody Anoni

Reputation: 243

Parse excel attachment from .eml file in python

I'm trying to parse a .eml file. The .eml has an excel attachment that's currently base 64 encoded. I'm trying to figure out how to decode it into XML so that I can later turn it into a CSV I can do stuff with.

This is my code right now:

import email

data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)

for part in msg.walk():
    c_type = part.get_content_type()
    c_disp = part.get('Content Disposition')


    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        excelContents = part.get_payload(decode = True)

        print excelContents

The problem is

When I try to decode it, it spits back something looking like this.

enter image description here

I've used this post to help me write the code above.

How can I get an email message's text content using Python?

Update:

This is exactly following the post's solution with my file, but part.get_payload() returns everything still encoded. I haven't figured out how to access the decoded content this way.

import email


data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        name = part.get_param('name') or 'MyDoc.doc'
        f = open(name, 'wb')
        f.write(part.get_payload(None, True)) 
        f.close()

        print part.get("content-transfer-encoding")

Upvotes: 4

Views: 3618

Answers (2)

Melody Anoni
Melody Anoni

Reputation: 243

Here is my solution:

I found 2 things out:

1.) I thought .open() was going inside the .eml and changing the selected decoded elements. I thought I needed to see decoded data before moving forward. What's really happening with .open() is it's creating a new file in the same directory of that .xlsx file. You must open the attachment before you will be able to deal with the data. 2.) You must open an xlrd workbook with the file path.

import email
import xlrd 

data = file('EmailFileName.eml').read()
    msg = email.message_from_string(data)  # entire message

    if msg.is_multipart():
        for payload in msg.get_payload():
            bdy = payload.get_payload()
    else:
        bdy = msg.get_payload()

    attachment = msg.get_payload()[1]


    # open and save excel file to disk
    f = open('excelFile.xlsx', 'wb')
    f.write(attachment.get_payload(decode=True))
    f.close()

    xls = xlrd.open_workbook(excelFilePath) # so something in quotes like '/Users/mymac/thisProjectsFolder/excelFileName.xlsx'

    # Here's a bonus for how to start accessing excel cells and rows
    for sheets in xls.sheets():
        list = []
        for rows in range(sheets.nrows):
            for col in range(sheets.ncols):
                list.append(str(sheets.cell(rows, col).value))

Upvotes: 0

Josh Friedlander
Josh Friedlander

Reputation: 11657

As is clear from this table (and as you have already concluded), this file is an .xlsx. You can't just decode it with unicode or base64: you need a special package. Excel files specifically are a bit tricker (for e.g. this one does PowerPoint and Word, but not Excel). There are a few online, see here - xlrd might be the best.

Upvotes: 2

Related Questions