Reputation: 340
I've got a ~3000 line long log file and I need to find the first occurrence of some string. Which way would be the best/most efficient way to go about doing it?
with open(filename, 'r') as f:
match = re.search(r'^EXHAUST.*', f.read(), re.MULTILINE)
or
with open(filename, 'r') as f:
for line in f:
match = re.match(r'EXHAUST.*', line)
or is there a better way I'm not thinking of?
Upvotes: 1
Views: 625
Reputation: 103
You can practically check the approximate time used by algorithm via something as simple as Python's datetime library, for example:
import datetime
start = datetime.datetime.now()
# insert your code here #
end = datetime.datetime.now()
result = end - start
print(result)
The thing is, with 3000 lines time consumption for the python algorithm to find the phrase is low with both methods. However, from my testings first method is a bit faster if text is located close to the end of text. I tested a 454kb text file with over 3000 lines, most lines being whole paragraphs. Figures being about 0.09s for (below). However, I have to mention that without ^ regex symbol for matching the start of a string, time taken to complete the task was only 0.04s.
with open(filename, 'r') as f:
match = re.search(phrase, f.read())
versus 0.12s for
with open(filename, 'r') as f:
i = 0
for line in f:
i += 1
match = re.match(phrase, line)
if match:
break;
Here break is needed, otherwise match object would be the last occurrence found and I used for finding out in which line we found the match. Because .start and .end methods for position otherwise would be relative to the line we're on. On search method, however, you could get the match position via .start and .end match object methods by default.
Yet in my test case, first occurrence was near the end of the .txt file so if it was closer to start 2nd method will prevail, because it would stop searching at that line, whereas first method's time consumption stays constant.
Unless you're doing this for competitive coding (where Python is probably not the best pick anyway) both methods take very little time anyway.
Upvotes: 0
Reputation: 107287
In this case as a more pythonic way you can use str.startswith
:
with open(filename, 'r') as f:
for line in f:
if line.startswith('EXHAUST') :
#do stuff
But about using re.search
vs re.match
if you want to match the string from beginning its more efficient that use re.match
that has been designed for this aim.
Upvotes: 3
Reputation: 379
I like your second one, but performance wise since your regex is really simple you can use the startswith method
with open(filename, 'r') as f:
for line in f:
match = line.startswith('EXHAUST')
Upvotes: 1