Reputation: 9794
I was just looking at the implementation of the Java String
class's .indexOf()
method and it seems the author of the code uses the brute force algorithm to find the substring in a given string. That is, the approach runs in O(mn), where m and n are the length of the source and target strings, respectively.
Why didn't the author use a more efficient algorithm like Rabin-Karp, which has a runtime complexity of O(m + n) if a good hash function is provided ?
I might be missing out on the complete knowledge behind the reason for this implementation and hence wanted to understand.
Upvotes: 9
Views: 2955
Reputation: 594
Just a guess, but remember that Java String are stored as UTF-16 for i18n reasons, not plain 8-bit ASCII. It could be that the supporting some algorthims on UTF-16 (and the more difficult UTF-8) could be problematic. Just a guess, though.
Upvotes: 0
Reputation: 372814
I don't know for sure why this decision was made, but if I uad to guess it's probably because for small pattern strings (a very common use case) the naive brute force algorithm is probably as fast if not faster than some of the asymptotically faster algorithms like Rabin-Karp, Boyer-Moore, or Knuth-Morris-Pratt. This seems like a reasonable default algorithm since in many cases you'll be searching small strings for small patterns, and the overhead from a powerful matched setup probably would be comparable to the runtime of the naive approach.
That said, nowhere in the Java spec does it mandate the use of this algorithm. They could just as easily have picked Rabin-Karp as the default algorithm.
Another reason they may have opted for this approach is because if you want to do fast text searching, the regex library provides faster string matching with more powerful search capabilities. Giving users the simple brute force algorithm by default and the option to switch to a more powerful set of tools when needed seems like a good way to balance asymptotic efficiency with practical efficiency.
Upvotes: 15
Reputation: 41097
I guess they didn't think people will use it for very large strings. With string length less than 100 it won't be much different.
Upvotes: 0
Reputation: 533530
I assume you mean indexOf or contains rather than substring. Substring is O(1)
Often simple code runs faster. Code which creates objects for example often runs much slower.
Why don't you try to implement it for yourself and see if it is faster. If it is, you can suggest they improve the method.
Upvotes: 6