name_masked
name_masked

Reputation: 9794

Choice of algorithm for .indexOf method in Java

I was just looking at the implementation of the Java String class's .indexOf() method and it seems the author of the code uses the brute force algorithm to find the substring in a given string. That is, the approach runs in O(mn), where m and n are the length of the source and target strings, respectively.

Why didn't the author use a more efficient algorithm like Rabin-Karp, which has a runtime complexity of O(m + n) if a good hash function is provided ?

I might be missing out on the complete knowledge behind the reason for this implementation and hence wanted to understand.

Upvotes: 9

Views: 2955

Answers (4)

Preston Lee
Preston Lee

Reputation: 594

Just a guess, but remember that Java String are stored as UTF-16 for i18n reasons, not plain 8-bit ASCII. It could be that the supporting some algorthims on UTF-16 (and the more difficult UTF-8) could be problematic. Just a guess, though.

Upvotes: 0

templatetypedef
templatetypedef

Reputation: 372814

I don't know for sure why this decision was made, but if I uad to guess it's probably because for small pattern strings (a very common use case) the naive brute force algorithm is probably as fast if not faster than some of the asymptotically faster algorithms like Rabin-Karp, Boyer-Moore, or Knuth-Morris-Pratt. This seems like a reasonable default algorithm since in many cases you'll be searching small strings for small patterns, and the overhead from a powerful matched setup probably would be comparable to the runtime of the naive approach.

That said, nowhere in the Java spec does it mandate the use of this algorithm. They could just as easily have picked Rabin-Karp as the default algorithm.

Another reason they may have opted for this approach is because if you want to do fast text searching, the regex library provides faster string matching with more powerful search capabilities. Giving users the simple brute force algorithm by default and the option to switch to a more powerful set of tools when needed seems like a good way to balance asymptotic efficiency with practical efficiency.

Upvotes: 15

fastcodejava
fastcodejava

Reputation: 41097

I guess they didn't think people will use it for very large strings. With string length less than 100 it won't be much different.

Upvotes: 0

Peter Lawrey
Peter Lawrey

Reputation: 533530

I assume you mean indexOf or contains rather than substring. Substring is O(1)

Often simple code runs faster. Code which creates objects for example often runs much slower.

Why don't you try to implement it for yourself and see if it is faster. If it is, you can suggest they improve the method.

Upvotes: 6

Related Questions