Reputation: 1557
Is there a way to search text files in python for a phrase withough having to use forloops and if statments such as:
for line in file:
if line in myphrase:
do something
This seems like a very inefficient way to go through the file as it does not run in parallel if I understand correctly, but rather iteratively. Is re.search a more efficient system by which to do it?
Upvotes: 0
Views: 1731
Reputation: 2182
The tool you need is called regular expressions (regex).
You can use it as follows:
import re
if re.match(myphrase, myfile.read()):
do_something()
Upvotes: 3
Reputation: 42129
Reading a sequential file (e.g. a text file) is always going to be a sequential process. Unless you can store it in separate chunks or skip ahead somehow it will be hard to do any parallel processing.
What you could do is separate the inherently sequential reading process from the searching process. This requires that the file content be naturally separated into chunks (e.g. lines) across which the search is not intended to find a result.
The general structure would look like this:
In this era of solid state drives and fast memory busses, you would need some pretty compelling constraining factors to justify going to that much trouble.
You can figure out your minimum processing time by measuring how long it takes to read (without processing) all the lines in your largest file. It is unlikely that the search process for each line will add much to that time given that I/O to read the data (even on an SSD) will take much longer than the search operation's CPU time.
Upvotes: 4
Reputation:
Let's say you have the file:
Hello World!
I am a file.
Then:
file = open("file.txt", "r")
x = file.read()
# x is now:
"Hello World!\nI am a file."
# just one string means that you can search it faster.
# Remember:
file.close()
Edit:
To actually test how long it takes:
import time
start_time = time.time()
# Read File here
end_time = time.time()
print("This meathod took " + str( end_time - start_time ) + " seconds to run!")
Another Edit:
I read some other articles and did the test, and the fastest checking meathod if you're just trying to find True of False is:
x = file.read() # "Hello World!\nI am a file."
tofind = "Hello"
tofind_in_x = tofind in x
# True
This meathod was faster than regex in my tests by quite a bit.
Upvotes: 3