Vlad Buculei
Vlad Buculei

Reputation: 15

How to extract data from msg files and insert (append) them to csv file?

I am making a script that extracts particular data (Subject,Date,Sender) from an Outlook saved message (.msg extension) and I want to fill the data in a csv file one line at a time.

So the script should go through the folder's file with msg extension and extract data. This is what I could come up with until now.

This code creates the initial file but it copies the same data from the first read email X times instead of moving to the next.

import os
import glob
import csv
import win32com.client

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")

files = glob.glob('PATH_TO_FILES\\*.msg')


for file in files:
    msg = outlook.OpenSharedItem(file)



    #print(file)

    #with open(file) as f:

        #msg=f.read()

        #print(msg)


    with open(r'Email.csv', mode='w') as file:
        fieldnames = ['Subject', 'Date', 'Sender']
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()

        #for f in os.listdir('.'):
        for f in files:


            #if not f.endswith('.msg'):
                #continue

    #msg = msg.Message(f)
                msg_sender = msg.SenderName
                msg_date = msg.SentOn
                msg_subj = msg.Subject
    #msg_message = msg.Body

                writer.writerow({'Subject': msg_subj, 'Date': msg_date, 'Sender': msg_sender})

Upvotes: 1

Views: 936

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 149175

It is a rather vicious mistake...

Just look at your structure:

for file in files:
    msg = outlook.OpenSharedItem(file)
    with open(r'Email.csv', mode='w') as file:
        for f in files:
            # process msg

and follow what happens:

  • you loop over the msg files
    • you store one
    • you open the csv file in 'w' mode erasing any previous data
    • you loop again over the msg files
      • and process the stored file

So you have 2 levels of loop over the msg files, and each iteration of the outer one resets the csv file. In the end, only the last one matters and processes n times the same last file.

How to fix: just loop once over the files, after opening the csv file:

with open(r'Email.csv', mode='w') as file:
    for f in files:
        msg = outlook.OpenSharedItem(f)
        # process msg

Upvotes: 1

Related Questions