Reputation: 579
I'm doing the below in Python
I have a problem that involves checking a large list(size n) of input strings to see if the substrings contain any words from a large dictionary(size m).
I've looked around for efficient algorithms for this problem and found these: https://github.com/laurentluce/python-algorithms/blob/master/algorithms/string_matching.py
the Rabin-Karp and KMP matching algorithms.(note that I've replaced the ord() function for the Rabin-Karp with a dictionary for efficiency)
However, these actually perform slower than using an 'in' operation in Python which uses the Boyer–Moore–Horspool algorithm. I suppose that this is because the contains() method invoked by 'in' was is implemented in C.
How can I override this method with the Rabin-Karp for the string class in Python in C?
Upvotes: 1
Views: 363
Reputation: 371
You could have a look at cython:
http://docs.cython.org/src/quickstart/cythonize.html
I find it easier to write some custom code for a specific operation than overriding a core python structure.
Upvotes: 2
Reputation: 37003
You can't, I am sorry to say.
The built-in types, being constructed at the time the interpreter is compiled, cannot be patched at run-time. If speed is really so important then you might want to write a C extension type that subclasses the built-in string type but with a different contains method.
Upvotes: 1