Reputation: 627
I have text files that look like something like
<Jun/11 09:14 pm>Information i need to capture1
<Jun/11 09:14 pm> Information i need to capture2
<Jun/11 09:14 pm> Information i need to capture3
<Jun/11 09:14 pm> Information i need to capture4
<Jun/11 09:15 pm> Information i need to capture5
<Jun/11 09:15 pm> Information i need to capture6
and two datetimes like
15/6/2015-16:27:10 # startDateTime
15/6/2015-17:27:19 # endDateTime
I need to grab all the information in the logs between the two datetimes. Currently I make a datetime object from each the two times im searching between.
I then read the file line by line and make a new datetime object that I compare against my start and end time to see if i should grab that line of information. However the files are huge(150MB) and the code can take hours to run(On 100+ files).
The code looks something like
f = open(fileToParse, "r")
for line in f.read().splitlines():
if line.strip() == "":
continue
lineDateTime = datetime.datetime(lineYear, lineMonth, lineDay, lineHour, lineMin, lineSec)
if (startDateTime < lineDateTime < endDateTime):
writeFile.write(line+"\n")
between = True
elif(lineDateTime > endDateTime):
writeFile.write(line+"\n")
break
else:
if between:
writeFile.write(line+"\n")
I want to rewrite this using some more smarts. The files can hold months of information, however I usually only search for about 1 hour to 3 days of data.
Upvotes: 2
Views: 169
Reputation: 180461
You are reading all the file into memory regardless, just iterate over the file object and break when the date is beyond your upper limit:
with open(fileToParse, "r") as f:
for line in f:
if not line.strip():
continue
lineDateTime = datetime.datetime(lineYear, lineMonth, lineDay, lineHour, lineMin, lineSec)
if startDateTime < lineDateTime < endDateTime:
writeFile.write(line + "\n")
elif lineDateTime > endDateTime:
break
Obviously you need to get lineYear, lineMonth
etc..
using f.read().splitlines()
not only reads all the lines into memory so if 5 lines in you are above the upper limit you still have all the lines in memory, you also split the lines so you create a full list of all the lines also.
You could also check the month/year are correct and only create datetime objects if you had the correct month/year which would be a lot faster.
If your lines started as above:
Jun/11
And you wanted Jun/11 then simply if line.startswith("Jun/11")
and only then start creating datetime objects.
with open(fileToParse, "r") as f:
for line in f:
if line.startswith("Jun/11"):
for line in f:
try:
lineDateTime = datetime.datetime...
except ValueError:
continue
if startDateTime < lineDateTime < endDateTime:
writeFile.write(line + "\n")
elif lineDateTime > endDateTime:
break
Upvotes: 2