Reputation: 71

Binary search vs linear search

I have the following implementation of binary search:

import time
numbers_one_to_hundred=[]
for num in range(1,100001,2): #25 66 35 64 (38,101) (1,101)
    numbers_one_to_hundred.append(num)

print('Please choose a number:')    
my_number=int(input())

print('Length array:',len(numbers_one_to_hundred))
#print('array:',numbers_one_to_hundred)
my_start_time = time.time()
start=0
end=len(numbers_one_to_hundred)
temp_arr=numbers_one_to_hundred
index_start=start
index_end=end
index_middle=int((index_end+index_start)/2)
found=False
while(not(start==0 and start==end)):
  if temp_arr[int((end+start)/2)]==my_number: #middle temp is my number
        print('you find',temp_arr[int((end+start)/2)],'index:',index_middle)
        found=True
        break
  elif my_number<temp_arr[int((end+start)/2)]: #number in first half
        end=int((start+end)/2)
        temp_arr=temp_arr[start:end] 
        start=0
        index_end=index_middle+1
        index_middle=int((index_middle+index_start)/2)
  elif my_number>temp_arr[int((end+start)/2)]: #number in second half
        start=int((end+start)/2)        
        temp_arr=temp_arr[start+1:end] 
        start=0
        end=int((end-1)/2) 
        index_start=index_middle+1
        index_middle=int((index_end+index_middle)/2)

if not(found): #number not in list
  print('Number not in list!')
  
my_end_time = time.time()

print('passed:',my_end_time-my_start_time)

When the input is for example 50003,this is the result:

Please choose a number:
50003
Length array: 50000
you find 50003 index: 25001
passed: 0.0028307437896728516

This is a code for linear search:

import time
numbers_one_to_hundred=[]
for num in range(1,100001,2): #25 66 35 64 (38,101) (1,101)
    numbers_one_to_hundred.append(num)

print('Please choose a number:')    
my_number=int(input())

print('Length array:',len(numbers_one_to_hundred))

my_start_time=time.time()  
if my_number in numbers_one_to_hundred:
    print('SUCCESS!')

else:
    print('NOT SUCCESS!')

my_end_time = time.time()    
print('passed:',my_end_time-my_start_time)

And result for this search is:

Please choose a number:
50003
Length array: 50000
SUCCESS!
passed: 0.0006299018859863281

My question is,why the advantage is to the linear search in this case?(and pretty big difference) Is there problem with one of the implementations? Or the input is the problem? (I tried 3-4 different input and in all of them the advantage was to the linear search) Or something else?

Upvotes: 1

Answers (3)

Samwise

Reputation: 71454

One possible explanation is that your data set isn't large enough for the algorithmic advantages of binary search to overcome the constant-time advantage of using a built-in search vs implementing it in the Python interpreter.

To start with, use a larger data set so that the linear search takes a significant amount of time:

import time
numbers = list(range(100000000))

print('Please choose a number:')    
my_number=int(input())

print('Length array:',len(numbers))

start_time = time.time()  
if my_number in numbers:
    print('SUCCESS!')
else:
    print('NOT SUCCESS!')

print('passed:', time.time() - start_time)

% python test.py
Please choose a number:
99999999
Length array: 100000000
SUCCESS!
passed: 0.9459362030029297

% python test.py
Please choose a number:
5
Length array: 100000000
SUCCESS!
passed: 0.0

The above output lets us confirm that this is indeed a linear search that takes a measurable amount of time -- searching for an item toward the beginning of the list is very fast, searching for an item toward the end of the list takes about a second.

Now let's do a proper comparison between three options -- the native in keyword, a linear search implemented via a for loop in the interpreter, and a binary search. (I'm going to quickly write my own binary search rather than copying and pasting yours -- hopefully this example gives you an indication of how you could compare the performance of different binary search implementations the same way I'm comparing the performance of different linear search implementations.)

import time
numbers = list(range(100000000))

def native_search(n):
    return n in numbers

def linear_search(n):
    for i in numbers:
        if i == n:
            return True
    return False

def binary_search(n):
    start, end = 0, len(numbers) - 1
    while True:
        mid = (start + end) // 2
        assert numbers[start] <= numbers[mid] <= numbers[end], "List isn't sorted!"
        if numbers[mid] == n:
            return True
        if start == end:
            return False
        if numbers[mid] > n:
            end = mid
        else:
            start = max(mid, start + 1)


print('Array length:', len(numbers))
print('Please choose a number:')    
my_number=int(input())

for func in (native_search, linear_search, binary_search):
    start_time = time.time()  
    print(f"{func.__name__}: {'NOT ' if not func(my_number) else ''}SUCCESS!")
    print('passed:', time.time() - start_time)

Array length: 100000000
Please choose a number:
99999999
native_search: SUCCESS!
passed: 0.9317595958709717
linear_search: SUCCESS!
passed: 2.556478500366211
binary_search: SUCCESS!
passed: 0.0009658336639404297

% python test.py
Array length: 100000000
Please choose a number:
50000000
native_search: SUCCESS!
passed: 0.47462892532348633
linear_search: SUCCESS!
passed: 1.4169631004333496
binary_search: SUCCESS!
passed: 0.0009737014770507812

We can see that the "native" linear search is about 2.5x faster than the version using the for loop -- this is because it's being done inside the Python interpreter (in compiled/optimized code) rather than in code that Python is interpreting as it goes.

But both of them are absolutely left in the dust by the binary search! The performance difference between my implementation and yours might be the temp_arr you use in your version -- if you eliminate that I suspect you'll see significant speedup.

Upvotes: 1

Michael Veksler

Reputation: 8475

You binary search copies parts of the list in lines such as temp_arr=temp_arr[start:end]. Since this copying happens in exponential decay parts (e.g. if original list is 128, then the next copy will be 64, and the one after that 32, etc), whose total cost is O(N) over the whole run of the algorithm.

EDIT: clarification regarding the total cost of the exponential decay mentioned above. The total cost of copying the list parts over all the iterations is N /2 + N /4 + N /8 + ... + 1, which is a geometric series, which equals N-1. Since usually the list's length is not a power of two, there will be a slight variation of the total cost, but it will still be O(N).

Although both the inefficient implementation of "binary search" and linear search, are O(N), the constant factor is higher for the "binary search" since it uses many more operations than linear search to achieve its goal. So overall your "binary search" is slower.

To make the code run faster, find a way to avoid creating new lists. A possible solution is to change the code such that start and end point to the original numbers_one_to_hundred list, without ever creating temp_arr

EDIT: Clarificaiton on improvments: Without copying partial lists, the binary search should be significantly faster for any list with enough elements. The cut-off size is likely between N=10 and N=100, when I'll guess the number to be closer to 20 for correctly implemented binary search in python.

Upvotes: 3

Ali

Reputation: 483

I think this is because binary search code is longer than linear search, their time is different, and linear search time is shorter because it has fewer operations. You can optimize binary search code and implement it with recursive method to have better performance.

Upvotes: 0

Binary search vs linear search

Answers (3)

Related Questions