Reputation: 111
I have an email that comes in everyday and the format of the email is always the same except some of the data is different. I wrote a VBA Macro that exports the email to a text file. Now that it is a text file I want to parse the data so that I get only the new information.
The format of email is like this
> Irrelevant data
> Irrelevant data
> Type: YUK
> Status: OPEN
> Date: 6/22/2015
> Description: ----
>
> Description blah blah blah
> Thank you
I want to capture the relevant data. For example, in this case i would like to only capture YUK, OPEN, 6/22/2015 and the Description blah blah blah. I tried using the csv module to go line by line and print out the lines but i cant seem to find a way to parse that information.
This is what I have so far. it only prints out the lines though.
import os
import glob
import csv
path = "emailtxt/"
glob = max(glob.iglob(path + '*.txt'), key=os.path.getctime)#most recent file located in the emailtxt
newestFile = os.path.basename(glob)#removes the emailtxt\ from its name
f = open(path+newestFile)
read = csv.reader(f)
for row in read:
print row
f.close()
How would I parse through the text file?
Upvotes: 1
Views: 3260
Reputation: 1858
I don't think that there the cvs
module is the one to be used here. If you are just going for a simple search, use string comparisons and split them by characteristic characters. If it's more sophisticated, go for regular expressions.
import os
with open("email.txt") as file:
data = [line.replace("> ","") for line in file.readlines()]
for line in data:
s = line.split(":")
if len(s) > 1:
print s[1].strip()
Upvotes: 1
Reputation: 1625
How about using Regular Expression
def get_info(string_to_search):
res_dict = {}
import re
find_type = re.compile("Type:[\s]*[\w]*")
res = find_type.search(string_to_search)
res_dict["Type"] = res.group(0).split(":")[1].strip()
find_Status = re.compile("Status:[\s]*[\w]*")
res = find_Status.search(string_to_search)
res_dict["Status"] = res.group(0).split(":")[1].strip()
find_date = re.compile("Date:[\s]*[/0-9]*")
res = find_date.search(string_to_search)
res_dict["Date"] = res.group(0).split(":")[1].strip()
res_dict["description"] = string_to_search.split("Description:")[1].replace("Thank you","")
return res_dict
search_string = """> Irrelevant data
> Irrelevant data
> Type: YUK
> Status: OPEN
> Date: 6/22/2015
> Description: ----
>
> Description blah blah blah
> Thank you
"""
info = get_info(search_string)
print info
print info["Type"]
print info["Status"]
print info["Date"]
print info["description"]
Output :
{'Status': 'OPEN', 'Date': '6/22/2015', 'Type': 'YUK', 'description': ' ----\n>\n> Description blah blah blah\n> \n'}
YUK
OPEN
6/22/2015
----
>
> Description blah blah blah
>
Upvotes: 1
Reputation: 11
If you are able to print out the rows individually, parsing them is a matter of breaking apart the rows (that are represented as strings). Assuming there is some space after each item descriptor, or a colon after each descriptor, you could use that to parse whatever comes after that colon and space. see the python string common operations to be able to split the row at useful points.
In terms of actually parsing the data, You could do a series of if statements to catch each status, or file type. for the date, try the time.strptime function to evaluate the date to a datetime object. All you have to do is match the format of the date, which in your case seems to be "%m/%d/%y".
Upvotes: 1
Reputation: 10951
I don't thin here you are in need of csv
module at all, just regular File I/O will do for you what you want, i.e; read the file line by line and from each line extract the data you need and store it in a list
for example:
import os
import glob
path = "emailtxt/"
glob = max(glob.iglob(path + '*.txt'), key=os.path.getctime)#most recent file located in the emailtxt
newestFile = os.path.basename(glob) #removes the emailtxt\ from its name
capture_list = [] #list to hold captured words
with open(path+newestFile, 'r') as f: #open the file for reading
for line in f: #Go line by line
capture_list.append(line.split()[2]) #add to the list last word
Upvotes: 1