finitenessofinfinity
finitenessofinfinity

Reputation: 1013

Time complexity when sorting is done before binary searching...please see

Suppose there is an array containing unsorted data and I need to choose either linear search or binary search for searching. Then which option should I choose? The time complexity for linear search is O(n) and for binary search is O(log n). But, the fastest sorting algorithm gives the time complexity of O(n * log n). Now, I don't know how to "add" complexities of two algorithms (if that's the right word) and hence, I am asking this question.

So my question is if sorting then binary searching is better than simply linear searching or is it the other way?

Plus, how do I prove whatever the case maybe using big O notation ( I mean "adding" and "comparing" the time complexities) ?

Thank you so much for reading!!! It means a lot.

Upvotes: 6

Views: 7999

Answers (3)

Grigor Gevorgyan
Grigor Gevorgyan

Reputation: 6853

If you have to do one search, do linear search. It's obviously better than sorting and then binary search.
But if you have multiple search queries, you in most cases should first sort the array, and then apply a binary search to every query.
Why ? Let's say you're going to perform O(k) search queries. If you do a linear search, you'll end up with O(n*k) operations. If you first sort, that will take O(nlogn) + O(klogn) = O((n+k)logn) operations. What is better ? When k is very small (less than logn), it's better to do linear search. However in most cases you'd better to sort first.

Upvotes: 6

Jim Mischel
Jim Mischel

Reputation: 134065

You don't really "add" the complexities. Sorting is, as you say, O(n * log n), and searching is O(log n). If you were to do "normal math" on them, then it would be (n+1)*log n, which is still n*log n.

When you're performing multiple steps like that, you typically take the highest complexity and call it that. After all, when n is sufficiently large, n*log n dwarfs log n.

Think of it this way: when n is 1,000,000, n*log n is 20 million. log n is 20. So what's the difference between 20,000,000 and 20,000,020? The (log n) term is irrelevant. So (n log n) + (log n) is, for all intents and purposes, equal to (n log n). Even when n is 100, log n is 7. The (log n) term just won't make a difference when n is even moderately large.

In your particular case, if you only need to search the list one time, then sequential search is the way to go. If you need to search it multiple times, then you have to weigh the cost of m searches O(m * n) against the cost of sorting and then searching. If you're interested in the minimum time and you know how many times you'll be searching the list, then you'd use sequential search if (m*n) is less than (n * log n). Otherwise use the sort and then binary search.

But that's not the only consideration. Binary search on a sorted list gives you very quick response time, whereas linear search can take a very long time for a single item. If you can afford to sort the list during program startup then that's probably the best way to go because items will be found (or not found) much faster once the program is operating. Sorting the list gives you better response time. It's better to pay the price of sorting during startup than to experience very unpredictable response times during operation. Or to find out that you need to do more searches than you thought. . .

Upvotes: 15

songlj
songlj

Reputation: 927

So my question is if sorting then binary searching is better than simply linear searching

Yes, you are right.

Binary search should be applied when the array has already been sorted. Otherwise you cannot use binary search. If you have a large quantity of queries, it would better to sort the array first, then apply binary search. However, if you have just a few queries, maybe linear search is enough.

As for big O notation, it is always the "large" part—i.e., if you sort then binary search, it would be O(n*lgn). If you just use linear search, it is O(n). but when the number of queries (m) is taken into consideration, the first approach will be O(n*lgn + m*lgn) while the second one becomes O(m*n). You can see that if m is large (m=n or m>>n), the second approach will be more complex than binary search.

Upvotes: 2

Related Questions