user1895076
user1895076

Reputation: 759

Error Extracting Outlook Email Data with Python

I have a Python script that uses os.walk and win32com.client to extract information from Outlook email files (.msg) from a folder and its subfolders on my C:/ drive. It appears to work, but when I try to do anything on the returned dataframe (such as emailData.head() Python crashes). I also cannot write the dataframe to .csv because of a permission error.

I'm wondering if my code is not properly closing outlook / each message and that is what is causing the problem? Any help would be appreciated.

import os
import win32com.client
import pandas as pd

# initialize Outlook client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")

# set input directory (where the emails are) and output directory (where you
# would like the email data saved)
inputDir = 'C:/Users/.../myFolderPath'
outputDir = 'C:/Users/.../myOutputPath'


def emailDataCollection(inputDir,outputDir):
    """ This function loops through an input directory to find
    all '.msg' email files in all folders and subfolders in the
    directory, extracting information from the email into lists,
    then converting the lists to a Pandas dataframe before exporting
    to a '.csv' file in the output directory
    """
    # Initialize lists
    msg_Path = []
    msg_SenderName = []
    msg_SenderEmailAddress = []
    msg_SentOn = []
    msg_To = []
    msg_CC = []
    msg_BCC = []
    msg_Subject = []
    msg_Body = []
    msg_AttachmentCount = []

    # Loop through the directory
    for root, dirnames, filenames in os.walk(inputDir):
        for filename in filenames:
            if filename.endswith('.msg'): # check to see if the file is an email
                filepath = os.path.join(root,filename) # save the full filepath
                # Extract email data into lists
                msg = outlook.OpenSharedItem(filepath)
                msg_Path.append(filepath)
                msg_SenderName.append(msg.SenderName)
                msg_SenderEmailAddress.append(msg.SenderEmailAddress)
                msg_SentOn.append(msg.SentOn)
                msg_To.append(msg.To)
                msg_CC.append(msg.CC)
                msg_BCC.append(msg.BCC)
                msg_Subject.append(msg.Subject)
                msg_Body.append(msg.Body)
                msg_AttachmentCount.append(msg.Attachments.Count)
                del msg

    # Convert lists to Pandas dataframe
    emailData = pd.DataFrame({'Path' : msg_Path,
                          'SenderName' : msg_SenderName,
                          'SenderEmailAddress' : msg_SenderEmailAddress,
                          'SentOn' : msg_SentOn,
                          'To' : msg_To,
                          'CC' : msg_CC,
                          'BCC' : msg_BCC,
                          'Subject' : msg_Subject,
                          'Body' : msg_Body,
                          'AttachmentCount' : msg_AttachmentCount
    }, columns=['Path','SenderName','SenderEmailAddress','SentOn','To','CC',
            'BCC','Subject','Body','AttachmentCount'])


    return(emailData)


# Call the function
emailData = emailDataCollection(inputDir,outputDir)

# Causes Python to crash
emailData.head()
# Fails due to permission error
emailData.to_csv(outputDir,header=True,index=False)

Upvotes: 0

Views: 2266

Answers (2)

Pavan Kumar Alluri
Pavan Kumar Alluri

Reputation: 9

I get AttributeError: OpenSharedItem.SenderName, when i run on bulk e-mails. Code works perfectly fine on limited e-mails (Tried for 5 and 10 emails)

Upvotes: 0

Sidney Tio
Sidney Tio

Reputation: 51

Hope this isn't too late, but I managed to find out the source of the problem:

The kernel crashed because of the datetime data from msg_SentOn. If you check the type() of the data in msg_SentOn, it is classified as a pywintype.datetime, which is incompatible with pandas.

You need to convert the elements in msg_SentOn to datetime.datetime format.

The source here is useful to do so: http://timgolden.me.uk/python/win32_how_do_i/use-a-pytime-value.html

Upvotes: 1

Related Questions