Reputation: 5660
I have an sorted array of strings: eg: ["bar", "foo", "top", "zebra"] and I want to search if an input word is present in an array or not.
eg:
search (String[] str, String word) {
// binary search implemented + string comaparison.
}
Now binary search will account for complexity which is O(logn), where n is the length of an array. So for so good.
But, at some point we need to do a string compare, which can be done in linear time.
Now the input array can contain of words of different sizes. So when I am calculating final complexity will the final answer be O(m*logn) where
m
is the size of word we want to search in the array, which in our case is "zebra" the word we want to search?
Upvotes: 0
Views: 2366
Reputation: 29
I was under the impression that what original poster said was correct by saying time complexity is O(m*logn).
If you use the suggested enhancement to improve the time complexity (to get O(m + logn)) by tracking previously matched letters I believe the below inputs would break it.
arr = [“abc”, “def”, “ghi”, “nlj”, “pfypfy”, “xyz”]
target = “nljpfy”
I expect this would incorrectly match on “pfypfy”. Perhaps one of the original posters can weigh in on this. Definitely curious to better understand what was proposed. It sounds like matched number of letters are skipped in next comparison.
Upvotes: 0
Reputation: 19158
Yes, your thinking as well your proposed solution, both are correct. You need to consider the length of the longest String too in the overall complexity of String searching.
A trivial String compare is an O(m)
operation, where m
is the length of the larger of the two strings.
But, we can improve a lot, given that the array is sorted. As user "doynax" suggests,
Complexity can be improved by keeping track of how many characters got matched during the string comparisons, and store the present count for the lower and upper bounds during the search. Since the array is sorted we know that the prefix of the middle entry to be tested next must match up to at least the minimum of the two depths, and therefore we can skip comparing that prefix. In effect we're always either making progress or stopping the incremental comparisons immediately on a mismatch, and thereby never needing to keep going over old ground.
So, overall m
number of character comparisons would have to be done till the end of the string, if found OR else not even that much(if fails at early stage).
So, the overall complexity would be O(m + log n)
.
Upvotes: 4