Reputation: 727
I've got a segment of my script which will create a list of files to scan through for key words..
The problem is, the log files collectively are around 11gb. When I use grep
in the shell to search through them, it takes around 4 or 5 minutes. When I do it with my python script, it just hangs the server to the extent where I need to reboot it.
Doesn't seem right that it would cause the whole server to crash, but in reality I don't need it to scroll through all the files, just those which were modified within the last week.
I've got this so far:
logs = [log for log in glob('/var/opt/cray/log/p0-current/*') if not os.path.isdir(log)]
I assume I will need to add something prior to this to initially filter out the wrong files?
I've been playing with os.path.getmtime
in this format:
logs = [log for log in glob('/var/opt/cray/log/p0-current/*') if not os.path.isdir(log)]
for log in logs:
mtime = os.path.getmtime(log)
if mtime < "604800":
do-stuff (create a new list? Or update logs?)
That's kind of where I am now, and it doesn't work but I was hoping there was something more elegant I could do with the list inline?
Upvotes: 1
Views: 757
Reputation: 13779
Depending on how many filenames and how little memory (512MB VPS?), it's possible you're running out of memory creating two lists of all the filenames (one from glob
and one from your list comprehension.) Not necessarily the case but it's all I have to go on.
Try switching to iglob
(which uses os.scandir
under the hood and returns an iterator) and using a generator expression and see if that helps.
Also, getmtime
gets a time, not an interval from now.
import os
import glob
import time
week_ago = time.time() - 7 * 24 * 60 * 60
log_files = (
x for x in glob.iglob('/var/opt/cray/log/p0-current/*')
if not os.path.isdir(x)
and os.path.getmtime(x) > week_ago
)
for filename in log_files:
pass # do something
Upvotes: 3