Reputation: 7377
Looked around and couldn't find a satisfactory answer. Does anyone know how to parse .msg files from outlook with Python?
I've tried using mimetools and email.parser with no luck. Help would be greatly appreciated!
Upvotes: 57
Views: 134460
Reputation: 286
Even though this is an old thread, I hope this information might help someone who is looking for a solution to what the thread subject exactly says. I strongly advise using the solution of mattgwwalker in github, which requires OleFileIO_PL module to be installed externally.
Upvotes: 7
Reputation: 641
I succeeded extracting relevant fields from MS Outlook files (.msg) using msg-extractor
utilitity by Matt Walker.
pip install extract-msg
Note, it may require to install additional modules, in my case, it required to install imapclient:
pip install imapclient
import extract_msg
f = r'MS_Outlook_file.msg' # Replace with yours
msg = extract_msg.Message(f)
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject
msg_message = msg.body
msg.close()
print('Sender: {}'.format(msg_sender))
print('Sent On: {}'.format(msg_date))
print('Subject: {}'.format(msg_subj))
print('Body: {}'.format(msg_message))
There are many other goodies in MsgExtractor utility, to be explored, but this is good to start with.
I had to comment out lines 3 to 8 within the file C:\Anaconda3\Scripts\ExtractMsg.py:
#"""
#ExtractMsg:
# Extracts emails and attachments saved in Microsoft Outlook's .msg files
#
#https://github.com/mattgwwalker/msg-extractor
#"""
Error message was:
line 3
ExtractMsg:
^
SyntaxError: invalid syntax
After blocking those lines, the error message disappeared and the code worked just fine.
Upvotes: 50
Reputation: 31
The extract-msg Python module (pip install extract-msg
) is also extremely useful because it allows quick access to the full headers from the message, something that Outlook makes much harder than necessary to get hold of.
My modification of Vladimir's code that shows full headers is:
#!/usr/bin/env python3
import extract_msg
import sys
msg = extract_msg.Message(sys.argv[1])
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject
print('Sender: {}'.format(msg_sender))
print('Sent On: {}'.format(msg_date))
print('Subject: {}'.format(msg_subj))
print ("=== Details ===")
for k, v in msg.header.items():
print("{}: {}".format(k, v))
print(msg.body)
Upvotes: 3
Reputation: 29
I found on the net a module called MSG PY. This is Microsoft Outlook .msg file module for Python. The module allows you to easy create/read/parse/convert Outlook .msg files. The module does not require Microsoft Outlook to be installed on the machine or any other third party application or library in order to work. For example:
from independentsoft.msg import Message
appointment = Message("e:\\appointment.msg")
print("subject: " + str(appointment.subject))
print("start_time: " + str(appointment.appointment_start_time))
print("end_time: " + str(appointment.appointment_end_time))
print("location: " + str(appointment.location))
print("is_reminder_set: " + str(appointment.is_reminder_set))
print("sender_name: " + str(appointment.sender_name))
print("sender_email_address: " + str(appointment.sender_email_address))
print("display_to: " + str(appointment.display_to))
print("display_cc: " + str(appointment.display_cc))
print("body: " + str(appointment.body))
Upvotes: 1
Reputation: 754
This works for me:
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(r"C:\test_msg.msg")
print msg.SenderName
print msg.SenderEmailAddress
print msg.SentOn
print msg.To
print msg.CC
print msg.BCC
print msg.Subject
print msg.Body
count_attachments = msg.Attachments.Count
if count_attachments > 0:
for item in range(count_attachments):
print msg.Attachments.Item(item + 1).Filename
del outlook, msg
Please refer to the following post regarding methods to access email addresses and not just the names (ex. "John Doe") from the To, CC and BCC properties - enter link description here
Upvotes: 63
Reputation: 21
I was able to parse it similar way as Vladimir mentioned above. However I needed to make small change by adding a for loop. The glob.glob(r'c:\test_email*.msg') returns a list whereas the Message(f) expect a file or str.
f = glob.glob(r'c:\test_email\*.msg')
for filename in f:
msg = ExtractMsg.Message(filename)
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject
msg_message = msg.body
Upvotes: 1
Reputation: 2307
I've tried the python email module and sometimes that doesn't successfully parse the msg file.
So, in this case, if you are only after text or html, the following code worked for me.
start_text = "<html>"
end_text = "</html>"
def parse_msg(msg_file,start_text,end_text):
with open(msg_file) as f:
b=f.read()
return b[b.find(start_text):b.find(end_text)+len(end_text)]
print parse_msg(path_to_msg_file,start_text,end_text)
Upvotes: 0