Reputation: 31
I was recently studying about binary and linear search, and decided to check for real what is the difference in the actual time taken by both of these searching algorithms. Here is my code:
#include <bits/stdc++.h>
using namespace std;
void Print(int arr[], int size) {
for (int i = 0; i < size; i++)
cout << arr[i] << " ";
cout << endl;
}
void Merge_sort(int arr[], int start, int end) {
if (start >= end) {
return;
}
int mid = (start + end) / 2;
Merge_sort(arr, start, mid);
Merge_sort(arr, mid + 1, end);
int size = end - start + 1;
int arr_new[size];
int i = start, j = mid + 1;
for (int k = 0; k < size; k++) {
if (i > mid) {
arr_new[k] = arr[j++];
} else if (j > end) {
arr_new[k] = arr[i++];
} else if (arr[i] < arr[j]) {
arr_new[k] = arr[i++];
} else {
arr_new[k] = arr[j++];
}
}
int index = start;
for (int k = 0; k < size; k++) {
arr[index++] = arr_new[k];
}
}
int binary_search(int arr[], int start, int end, int x) {
if (start > end) {
return -1;
}
int mid = (start + end) / 2;
if (arr[mid] == x) {
return mid;
} else if (arr[mid] > x) {
return binary_search(arr, start, mid - 1, x);
} else {
return binary_search(arr, mid + 1, end, x);
}
}
int linear_search(int arr[], int size, int x) {
for (int i = 0; i < size; i++) {
if (arr[i] == x) {
return i;
}
}
return -1;
}
int main() {
int size;
clock_t start, start1, end, end1;
double time1, time2;
cin >> size;
int *arr1 = new int[size];
int *arr2 = new int[size];
for (int i = 0; i < size; i++) {
int j = rand() % 1000;
arr1[i] = i;
arr2[i] = i;
}
Merge_sort(arr1, 0, size - 1);
start = clock();
cout << binary_search(arr1, 0, size - 1, 15) << endl;
end = clock();
time1 = (double)(end - start) / (double)(CLOCKS_PER_SEC);
cout << "Time taken by binary search is : \t" << fixed
<< time1 << setprecision(8);
cout << " sec " << endl;
start1 = clock();
cout << linear_search(arr2, size, 15) << endl;
end1 = clock();
time2 = (double)(end1 - start1) / (double)(CLOCKS_PER_SEC);
cout << "Time taken by linear search is : \t" << fixed
<< time2 << setprecision(8);
cout << " sec " << endl;
return 0;
}
I have used Merge sort to sort the array for the binary search part, but have excluded it while calculating the time taken. Kindly take a look and tell me where am I going wrong. As mentioned in a lot of comments, I am adding one result I got on my online IDE.
size = 10000, linear = 0.000005 sec, binary = 0.00005 sec
Upvotes: 1
Views: 1907
Reputation: 144540
The problem is very simple and a good C++ compiler points to it immediately with an adequate warning level:
c++ -O3 -Weverything -o binsearch binsearch.cpp
binsearch.cpp:81:13: warning: unused variable 'j' [-Wunused-variable]
int j = rand() % 1000;
This warning points to a major problem in this loop in the main()
function:
for (int i = 0; i < size; i++) {
int j = rand() % 1000;
arr1[i] = i;
arr2[i] = i;
}
rand()
is not used and both arrays are initialized to the same monotonic sequence from 0
to size-1
.
When you measure the time it takes to find the value 15
, the result will likely favor linear search because 15
appears in 16th position in both arrays, very close to the beginning, hence it is found very quickly in a linear search and with approximately the same number of steps for binary search in a 10000 element sorted array.
There are more problems:
You include the time take to format the output in the measurement, which is incorrect as it may dominate the benchmark. It is recommended to perform all measurements in silent mode and display the results afterwards.
You should also perform multiple searches: for numbers appearing at the beginning, middle and end of the array and for numbers not appearing in the arrays.
You should repeat the searches to get more meaningful timings for functions that run quickly because clock()
may have a limited granularity (1 microsecond on OS/X).
Here is a modified version with a benchmark for various sizes:
#include <bits/stdc++.h>
using namespace std;
static void Merge_sort(int arr[], int start, int end) {
if (start >= end) {
return;
}
int mid = start + (end - start) / 2;
Merge_sort(arr, start, mid);
Merge_sort(arr, mid + 1, end);
int size = end - start + 1;
int *arr_new = new int[size];
int i = start, j = mid + 1;
for (int k = 0; k < size; k++) {
if (i > mid) {
arr_new[k] = arr[j++];
} else if (j > end) {
arr_new[k] = arr[i++];
} else if (arr[i] < arr[j]) {
arr_new[k] = arr[i++];
} else {
arr_new[k] = arr[j++];
}
}
int index = start;
for (int k = 0; k < size; k++) {
arr[index++] = arr_new[k];
}
delete[] arr_new;
}
static int binary_search(int arr[], int start, int end, int x) {
if (start > end) {
return -1;
}
int mid = start + (end - start) / 2;
if (arr[mid] == x) {
return mid;
} else if (arr[mid] > x) {
return binary_search(arr, start, mid - 1, x);
} else {
return binary_search(arr, mid + 1, end, x);
}
}
static int linear_search(int arr[], int size, int x) {
for (int i = 0; i < size; i++) {
if (arr[i] == x) {
return i;
}
}
return -1;
}
static int test(int size) {
int *arr1 = new int[size];
int *arr2 = new int[size];
for (int i = 0; i < size; i++) {
int j = rand();
arr1[i] = j;
arr2[i] = j;
}
Merge_sort(arr1, 0, size - 1);
clock_t start1 = clock();
int index1[4] = {};
for (int i = 0; i < 100; i++) {
index1[0] = binary_search(arr1, 0, size - 1, arr1[0]);
index1[1] = binary_search(arr1, 0, size - 1, arr1[size / 2]);
index1[2] = binary_search(arr1, 0, size - 1, arr1[size - 1]);
index1[3] = binary_search(arr1, 0, size - 1, -1);
}
clock_t end1 = clock();
clock_t start2 = clock();
int index2[4] = {};
for (int i = 0; i < 100; i++) {
index2[0] = linear_search(arr2, size, arr1[0]);
index2[1] = linear_search(arr2, size, arr1[size / 2]);
index2[2] = linear_search(arr2, size, arr1[size - 1]);
index2[3] = linear_search(arr2, size, -1);
}
clock_t end2 = clock();
double time1 = (end1 - start1) * 1000.0 / CLOCKS_PER_SEC;
double time2 = (end2 - start2) * 1000.0 / CLOCKS_PER_SEC;
printf("size=%8d: binary searches: %8.3fms, %d %d %d %d\n",
size, time1, index1[0], index1[1], index1[2], index1[3]);
printf("size=%8d: linear searches: %8.3fms, %d %d %d %d\n",
size, time2, index2[0], index2[1], index2[2], index2[3]);
delete[] arr1;
delete[] arr2;
return 0;
}
int main() {
for (int size = 1; size <= 10000000; size *= 10) {
test(size);
}
return 0;
}
Output:
size= 1: binary searches: 0.002ms, 0 0 0 -1
size= 1: linear searches: 0.000ms, 0 0 0 -1
size= 2: binary searches: 0.002ms, 0 1 1 -1
size= 2: linear searches: 0.002ms, 0 1 1 -1
size= 5: binary searches: 0.002ms, 0 2 4 -1
size= 5: linear searches: 0.002ms, 3 0 4 -1
size= 10: binary searches: 0.003ms, 0 5 9 -1
size= 10: linear searches: 0.003ms, 9 7 1 -1
size= 20: binary searches: 0.004ms, 0 10 19 -1
size= 20: linear searches: 0.004ms, 15 18 5 -1
size= 50: binary searches: 0.004ms, 0 25 49 -1
size= 50: linear searches: 0.005ms, 44 24 0 -1
size= 100: binary searches: 0.006ms, 0 50 99 -1
size= 100: linear searches: 0.012ms, 34 91 0 -1
size= 200: binary searches: 0.006ms, 0 100 199 -1
size= 200: linear searches: 0.022ms, 62 123 58 -1
size= 500: binary searches: 0.006ms, 0 250 499 -1
size= 500: linear searches: 0.050ms, 47 389 252 -1
size= 1000: binary searches: 0.006ms, 0 500 999 -1
size= 1000: linear searches: 0.121ms, 902 808 422 -1
size= 2000: binary searches: 0.008ms, 0 1000 1999 -1
size= 2000: linear searches: 0.369ms, 733 1992 1866 -1
size= 5000: binary searches: 0.010ms, 0 2500 4999 -1
size= 5000: linear searches: 0.656ms, 2635 4605 1819 -1
size= 10000: binary searches: 0.015ms, 0 5000 9999 -1
size= 10000: linear searches: 1.137ms, 6722 8419 5128 -1
size= 20000: binary searches: 0.012ms, 0 10000 19999 -1
size= 20000: linear searches: 2.265ms, 16893 6637 10738 -1
size= 50000: binary searches: 0.013ms, 0 25000 49999 -1
size= 50000: linear searches: 4.538ms, 19705 40371 9661 -1
size= 100000: binary searches: 0.014ms, 0 50000 99999 -1
size= 100000: linear searches: 8.565ms, 42378 57447 33679 -1
size= 200000: binary searches: 0.020ms, 0 100000 199999 -1
size= 200000: linear searches: 26.003ms, 14037 198103 158988 -1
size= 500000: binary searches: 0.016ms, 0 250000 499999 -1
size= 500000: linear searches: 33.657ms, 162357 162899 18194 -1
As you can see, binary search is much faster in the general case for large sizes, but linear search is much simpler and has faster or equivalent timings for up to 20 elements. A non recursive version of binary_search might lower the threshold a little bit. Note however that your implementation of binary_search
does not always return the index of the first occurrence in case of duplicates.
Upvotes: 2
Reputation: 28808
The code in the question includes the time for std::cout to output data. Try using a non-recursive binary search:
int BinSrc(int a[], int cnt, int x)
{
int lo = 0;
int hi = cnt;
int i = 0;
while((hi - lo) > 1){
i = (lo + hi)/2;
if(x < a[i]){
hi = i;
continue;
}
if(x > a[i]){
lo = i;
continue;
}
break;
}
if(x == a[i])
return i;
return cnt;
}
Test program
#include <chrono>
#include <iostream>
#include <iomanip>
//----------------------------------------------------------------------//
// BinSrc //
//----------------------------------------------------------------------//
int BinSrc(int a[], int cnt, int x)
{
int lo = 0;
int hi = cnt;
int i = 0;
while((hi - lo) > 1){
i = (lo + hi)/2;
if(x < a[i]){
hi = i;
continue;
}
if(x > a[i]){
lo = i;
continue;
}
break;
}
if(x == a[i])
return i;
return cnt;
}
//----------------------------------------------------------------------//
// LinSrc //
//----------------------------------------------------------------------//
int LinSrc(int a[], int cnt, int x)
{
int i;
for(i = 0; i < cnt; i++)
if(x == a[i])
return i;
return cnt;
}
#define COUNT (10000)
#define GCC 0
int main()
{
int * a = new int [COUNT]; // allocate data array
std::chrono::high_resolution_clock::time_point binbeg, binend, linbeg, linend;
int i, j;
for(int i = 0; i < COUNT; i++)
a[i] = i;
#if GCC
linbeg = std::chrono::_V2::system_clock::now();
i = LinSrc(a, COUNT, COUNT-1);
linend = std::chrono::_V2::system_clock::now();
binbeg = std::chrono::_V2::system_clock::now();
j = BinSrc(a, COUNT, COUNT-1);
binend = std::chrono::_V2::system_clock::now();
#else
linbeg = std::chrono::steady_clock::now();
i = LinSrc(a, COUNT, COUNT-1);
linend = std::chrono::steady_clock::now();
binbeg = std::chrono::steady_clock::now();
j = BinSrc(a, COUNT, COUNT-1);
binend = std::chrono::steady_clock::now();
#endif
std::cout << "Time taken by linear search is : \t" << std::fixed << std::setprecision(8)
<< (std::chrono::duration_cast<std::chrono::nanoseconds>(linend - linbeg).count())/1000000000.;
std::cout << " sec " << std::endl;
std::cout << "Time taken by binary search is : \t" << std::fixed << std::setprecision(8)
<< (std::chrono::duration_cast<std::chrono::nanoseconds>(binend - binbeg).count())/1000000000.;
std::cout << " sec " << std::endl;
delete a;
if(i != j)
std::cout << "error" << std::endl;
return(0);
}
Upvotes: 0
Reputation: 584
If the input size is large enough, binary search is indeed faster than linear search. When computer scientists talk about one algorithm being faster than another one, they mean that the time complexity of one algorithm is better than the other one.
Binary search has worst time complexity O(lgn)
, where lgn
means 2-based logarithm of n.
Linear search has worst time complexity O(n)
, where n
.
In both of these cases, n
is the size of the input, means the size of the set of numbers you are searching.
Formally, O(g(n)) is defined as, The set of functions f(n) such that there exists a positive integer n_0 such that for n >= n_0, there exists a constant 0 <= f(n) <= c1*g(n)
All the mathematics and formal definitions aside, in summary, binary search is better than linear search in long term. By long term I mean if your input size is big enough. I am assuming you ran the program using less numbers, like 10 or maybe a 100. But when you have a large amount of data, say, 10^5 numbers, then binary search is always going to be faster.
Also, an algorithm is better than other usually means in worst case. That is, if the number you are searching for is in the beginning of array, then on the first try you will get that number. But for binary search, it will take a few number of steps.
For linear search, the worst case is if the number that you are searching for is at the end of the list.
So in worst cases and average cases, binary search performs better.
In your program, you have used recursion for binary searching and a for loop for linear search. Recursion has more overhead than a loop. And the compiler does some smart preprocessing when you are using a loop as @JosephLarson mentioned. But if you run your code on large enough inputs, I am assuming that you will find that binary search is faster.
Also, you can compare binary search to finding a specific page number in a book. Binary search will always be faster than going one page at a time.
Upvotes: 0
Reputation: 9058
I was just watching an interesting CPP Con video on YouTube about sorting, and part of it addressed linear versus binary searching.
Modern CPUs do some very interesting predictive work. They may guess ahead and can be a LOT more efficient if you follow the path they think your code is going to take. Binary searches make it impossible for that to work, so they can actually be slower on modern hardware where older hardware didn't do this, and thus binary searches were typically much faster as soon as the sample size grew to something over maybe as few as 20 values.
https://www.youtube.com/watch?v=FJJTYQYB1JQ&t=2272s
So to some extent, your sample size will matter, but this talk may also shed some light.
Upvotes: 4