Lamma
Lamma

Reputation: 1557

Searching python text file without for loops and if statments

Is there a way to search text files in python for a phrase withough having to use forloops and if statments such as:

for line in file:
    if line in myphrase:
        do something

This seems like a very inefficient way to go through the file as it does not run in parallel if I understand correctly, but rather iteratively. Is re.search a more efficient system by which to do it?

Upvotes: 0

Views: 1731

Answers (3)

Simon Crane
Simon Crane

Reputation: 2182

The tool you need is called regular expressions (regex).

You can use it as follows:

import re

if re.match(myphrase, myfile.read()):
    do_something()

Upvotes: 3

Alain T.
Alain T.

Reputation: 42129

Reading a sequential file (e.g. a text file) is always going to be a sequential process. Unless you can store it in separate chunks or skip ahead somehow it will be hard to do any parallel processing.

What you could do is separate the inherently sequential reading process from the searching process. This requires that the file content be naturally separated into chunks (e.g. lines) across which the search is not intended to find a result.

The general structure would look like this:

  • initiate a list of processing threads with input queues
  • read the file line by line and accumulate chunks of lines up to a given threshold
  • when the threshold or the end of file is reached, add the chunk of lines to the next processing thread's input queue
  • wait for all processing threads to be done
  • merge results from all the search threads.

In this era of solid state drives and fast memory busses, you would need some pretty compelling constraining factors to justify going to that much trouble.

You can figure out your minimum processing time by measuring how long it takes to read (without processing) all the lines in your largest file. It is unlikely that the search process for each line will add much to that time given that I/O to read the data (even on an SSD) will take much longer than the search operation's CPU time.

Upvotes: 4

user11229202
user11229202

Reputation:

Let's say you have the file:

Hello World!
I am a file.

Then:

file = open("file.txt", "r")
x = file.read()
# x is now:
"Hello World!\nI am a file."
# just one string means that you can search it faster.
# Remember:
file.close()

Edit:

To actually test how long it takes:

import time
start_time = time.time()
# Read File here
end_time = time.time()
print("This meathod took " + str( end_time - start_time ) + " seconds to run!")

Another Edit:

I read some other articles and did the test, and the fastest checking meathod if you're just trying to find True of False is:

x = file.read() # "Hello World!\nI am a file."
tofind = "Hello"
tofind_in_x = tofind in x
# True

This meathod was faster than regex in my tests by quite a bit.

Upvotes: 3

Related Questions