Reputation: 127
I need a fast and efficient method for searching pattern string from a list of many pattern strings which are valid substring of a string.
Conditions -
Ask -
I have to traverse the file and for each line, I have to search for a matched pattern string that is a valid substring of the line (whichever comes first from the list of 100 pattern strings).
Example -
pattern_strings = ["earth is round and huge","earth is round", "mars is small"]
Testcase file contents - Among all the planets, the earth is round and mars is small.
..
..
Hence for the first line, the string at index 1 should qualify the condition.
Currently, I am trying to do a linear search -
def search(line,list_of_patterns):
for pat in list_of_patterns:
if pat in line:
return pat
else:
continue
return -1
The current run time is 21 minutes. The intent is to reduce it further. Need suggestions!
Upvotes: 0
Views: 170
Reputation: 47
One trick I know of, though it has nothing to do with changing your existing code, is to try to run your code with PyPy rather than the standard CPython interpreter. That could be one trick that does significantly speed up execution.
https://www.pypy.org/features.html
As I have installed and used it myself, I can tell you know that installation is fairly simple.
This is one option if you do not want to change your code.
Another suggestion would be to time your code or use profilers to see where the bottleneck is and what is taking a relatively long amount of time.
Code-wise, you could avoid for loop and try these methods: https://betterprogramming.pub/how-to-replace-your-python-for-loops-with-map-filter-and-reduce-c1b5fa96f43a
A final option would be to write that piece of code in a faster more performant language such as C++ and call that .exe (if on Windows) from Python.
Upvotes: 1