Reputation: 67
I have the following code, which is likely to be repeated. Meaning, i will need to search the entire log file during different parts of the code to see if two particular patterns exist in it. I cant always search for the patterns at once, at the start of the code.
But basically, below is what I have and I'm looking for ways to optimize it. assuming the logfile being read can be very huge in size.
textfile = open(logfile, 'r')
filetext = textfile.read()
textfile.close()
matchesBegin = re.search(BeginSearchDVar, filetext)
matchesEnd = re.search(EndinSearchDVar, filetext)
if matchesBegin is not None and matchesEnd is not None:
LRangeA = SeeIfExactRangeIsFound()
PatternCount = len(LRangeA)
LRange = '\n'.join(LRangeA)
I know this can be optimized with the with option but i dont know how to go about doing that.
Upvotes: 0
Views: 106
Reputation: 330
If you're looking for optimization, use the mmap module
.
Memory-mapping a file uses the operating system virtual memory system to access the data on the file system directly, instead of using normal I/O functions. Memory-mapping typically improves I/O performance because it does not involve a separate system call for each access and it does not require copying data between buffers – > the memory is accessed directly by both the kernel and the user application.
import mmap
import re
# Create pattern with all, ignore case, and multi line flags.
# search for every instance of `stackoverflow` within a sentence.
pattern = re.compile( rb'(\.\W+)?([^.]?stackoverflow[^.]*?\.)',
re.DOTALL | re.IGNORECASE | re.MULTILINE )
# open file using 'with' which initializes and finalizes an instance
with open( log_file, "r" ) as file:
# create new instance of mmap
with mmap.mmap( file.fileno(), # fileno returns file descriptor for IO
0, # size in bytes for how much to map (if 0 then entire file)
access = mmap.ACCESS_READ # set access flag to read
) as m: # name instance `m`
matches = pattern.findall( m ) # find all patterns in mapped file
print( "Matches: " + len( matches ) )
for match in matches:
print( match.group(0) )
If the file is truly massive, you could change the second argument (byte size to map) to better suite your needs.
Upvotes: 4