Reputation: 759
I have a Python script that uses os.walk
and win32com.client
to extract information from Outlook email files (.msg) from a folder and its subfolders on my C:/ drive. It appears to work, but when I try to do anything on the returned dataframe (such as emailData.head()
Python crashes). I also cannot write the dataframe to .csv because of a permission error.
I'm wondering if my code is not properly closing outlook / each message and that is what is causing the problem? Any help would be appreciated.
import os
import win32com.client
import pandas as pd
# initialize Outlook client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
# set input directory (where the emails are) and output directory (where you
# would like the email data saved)
inputDir = 'C:/Users/.../myFolderPath'
outputDir = 'C:/Users/.../myOutputPath'
def emailDataCollection(inputDir,outputDir):
""" This function loops through an input directory to find
all '.msg' email files in all folders and subfolders in the
directory, extracting information from the email into lists,
then converting the lists to a Pandas dataframe before exporting
to a '.csv' file in the output directory
"""
# Initialize lists
msg_Path = []
msg_SenderName = []
msg_SenderEmailAddress = []
msg_SentOn = []
msg_To = []
msg_CC = []
msg_BCC = []
msg_Subject = []
msg_Body = []
msg_AttachmentCount = []
# Loop through the directory
for root, dirnames, filenames in os.walk(inputDir):
for filename in filenames:
if filename.endswith('.msg'): # check to see if the file is an email
filepath = os.path.join(root,filename) # save the full filepath
# Extract email data into lists
msg = outlook.OpenSharedItem(filepath)
msg_Path.append(filepath)
msg_SenderName.append(msg.SenderName)
msg_SenderEmailAddress.append(msg.SenderEmailAddress)
msg_SentOn.append(msg.SentOn)
msg_To.append(msg.To)
msg_CC.append(msg.CC)
msg_BCC.append(msg.BCC)
msg_Subject.append(msg.Subject)
msg_Body.append(msg.Body)
msg_AttachmentCount.append(msg.Attachments.Count)
del msg
# Convert lists to Pandas dataframe
emailData = pd.DataFrame({'Path' : msg_Path,
'SenderName' : msg_SenderName,
'SenderEmailAddress' : msg_SenderEmailAddress,
'SentOn' : msg_SentOn,
'To' : msg_To,
'CC' : msg_CC,
'BCC' : msg_BCC,
'Subject' : msg_Subject,
'Body' : msg_Body,
'AttachmentCount' : msg_AttachmentCount
}, columns=['Path','SenderName','SenderEmailAddress','SentOn','To','CC',
'BCC','Subject','Body','AttachmentCount'])
return(emailData)
# Call the function
emailData = emailDataCollection(inputDir,outputDir)
# Causes Python to crash
emailData.head()
# Fails due to permission error
emailData.to_csv(outputDir,header=True,index=False)
Upvotes: 0
Views: 2266
Reputation: 9
I get AttributeError: OpenSharedItem.SenderName, when i run on bulk e-mails. Code works perfectly fine on limited e-mails (Tried for 5 and 10 emails)
Upvotes: 0
Reputation: 51
Hope this isn't too late, but I managed to find out the source of the problem:
The kernel crashed because of the datetime data from msg_SentOn. If you check the type() of the data in msg_SentOn, it is classified as a pywintype.datetime, which is incompatible with pandas.
You need to convert the elements in msg_SentOn to datetime.datetime format.
The source here is useful to do so: http://timgolden.me.uk/python/win32_how_do_i/use-a-pytime-value.html
Upvotes: 1