Will
Will

Reputation: 255

Python email header extractor

I have a simple piece of code with will look at a email header and pull out the date, from, to and subject of the email header. To do this i must put the email header into a .txt document in order for the code to read the header.

from email.parser import BytesHeaderParser
from glob import glob
import csv

fields = ['Date', 'From', 'To', 'Subject']

out = csv.writer(open('output.csv', 'w'))
out.writerow(["File name"]+fields)

parser = BytesHeaderParser()

for name in glob('*.msg'):
with open(name, 'rb') as fd:
msg = parser.parse(fd)
out.writerow([name]+[msg[f] for f in fields])

I want to be able to do this in a mass amount, so when dealing with large amounts of emails from the same 'phishing campaign' i can put all the .msg into one folder and run the script to extract the data i need.

Is this possible also willing to do the code in powershell.

Thanks.

Upvotes: 0

Views: 931

Answers (1)

Sam Mason
Sam Mason

Reputation: 16213

I'd strongly suggest using one of the mime parsers built into Python for handling emails. It's a relatively complicated format and doing naive things like you do above will tend to give you the wrong thing. For example header lines can span multiple lines and you'd just get some of it with your code.

it should be a simple matter of doing:

from email.parser import HeaderParser
from glob import glob
import csv

fields = ['Date', 'From', 'To', 'Subject']

out = csv.writer(open('output.csv', 'w'))
out.writerow(["File name"]+fields)

parser = HeaderParser()

for name in glob('*.msg'):
  with open(name) as fd:
    msg = parser.parse(fd)
  out.writerow([name]+[msg[f] for f in fields])

Upvotes: 0

Related Questions