user5597655
user5597655

Reputation:

Getting a binary search to work in Python

I am trying to get a binary search to work in Python. I have a massive, sorted list of passwords. The plan is to get a password input from the user and see if it is in the list. I've decided to implement a binary search because of the size of the list.

Here's my code:

Found = False
Password = user_input("Enter a password: ")


with io.open('final.txt', encoding='latin-1') as myfile:

    data = myfile.readlines()
    low = 0
    high = (int(len(data))+1)
    while (low < high) and not Found:

        mid = int((low+high)/2)

        if data[mid] == Password:
            Found = True
            break
        elif Password < str(data[mid]):
            high = mid - 1
        elif Password > str(data[mid]):
            low = mid + 1

I am guessing it is because of the string comparison? Any ideas? The binary search never returns true, even if I explicitly search something that I know is in the list.

I used this code to sort the password list.

import io

with io.open('result.txt', encoding='latin-1') as myfile:
    data = myfile.readlines()

def partition(data, start, end):
    pivot = data[end]                          # Partition around the last value
    bottom = start-1                           # Start outside the area to be partitioned
    top = end                                  # Ditto

    done = 0
    while not done:                            # Until all elements are partitioned...

        while not done:                        # Until we find an out of place element...
            bottom = bottom+1                  # ... move the bottom up.

            if bottom == top:                  # If we hit the top...
                done = 1                       # ... we are done.
                break

            if data[bottom] > pivot:           # Is the bottom out of place?
                data[top] = data[bottom]       # Then put it at the top...
                break                          # ... and start searching from the top.

        while not done:                        # Until we find an out of place element...
            top = top-1                        # ... move the top down.

            if top == bottom:                  # If we hit the bottom...
                done = 1                       # ... we are done.
                break

            if data[top] < pivot:              # Is the top out of place?
                data[bottom] = data[top]       # Then put it at the bottom...
                break                          # ...and start searching from the bottom.

    data[top] = pivot                          # Put the pivot in its place.
    return top                                 # Return the split point


def quicksort(data, start, end):
    if start < end:                            # If there are two or more elements...
        split = partition(data, start, end)    # ... partition the sublist...
        quicksort(data, start, split-1)
        quicksort(data, split+1, end)


quicksort(data, 0, (int(len(data))-1))

with io.open('final.txt', 'w', encoding='latin-1') as f:
    for s in data:
        f.write(s)

The sorted list looks something like this: whitespace, then symbols, then numbers, then capital letters (alphabetically sorted), then common letters (alphabetically sorted).

Upvotes: 0

Views: 1145

Answers (6)

Joost
Joost

Reputation: 4134

You're skipping parts of your list because of the way you're setting low and high. Because of this, low == high occurs after updating and before checking, causing you to jump out of the loop prematurely.

There are two easy solutions:

Either..

  • set high = mid or low = mid instead of mid -/+ 1, triggering an extra iteration,

or..

  • Check if high == low and data[low] == Password after the loop terminates, as you might still find Password there.

Upvotes: 0

KIDJourney
KIDJourney

Reputation: 1220

There are two problems .

  1. Your binary search algorithm is wrong .

The repeat condition should be

while (low <= high)

or your can't find the first and the last element .

  1. readlines() will read \n but user_input() does not .

Which causes `Password` == `Password\n' be false forever.

Upvotes: 0

Richard Rublev
Richard Rublev

Reputation: 8164

This is example of binary search

def binarySearch(alist, item):
        first = 0
        last = len(alist)-1
        found = False

        while first<=last and not found:
            midpoint = (first + last)//2
            if alist[midpoint] == item:
                found = True
            else:
                if item < alist[midpoint]:
                    last = midpoint-1
                else:
                    first = midpoint+1

        return found

mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))

mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))

I got then

False
False
True
True

Yes and I am sure that you need new line character at the end of each password after calling readlines,as Eamon pointed out.

Upvotes: 0

Eamonn McEvoy
Eamonn McEvoy

Reputation: 8986

You probably have a new line character at the end of each password after calling readlines, use rstrip() to remove it

    Found = False
    Password = user_input("Enter a password: ")


    with io.open('final.txt', encoding='latin-1') as myfile:

        data = myfile.readlines()
        low = 0
        high = len(data)-1   #no need to cast to int, should be len()-1
        while (low <= high) and not Found:  #less than or equal to

            mid = int((low+high)/2)

            if data[mid].rstrip() == Password:   #Remove newline character before compare
                Found = True
                break
            elif Password < str(data[mid]):
                high = mid - 1
            elif Password > str(data[mid]):
                low = mid + 1

Upvotes: 0

sumit
sumit

Reputation: 3720

If you only want to search the password in your list then In your code

data = myfile.readlines()

you have already taken all the passwords into the memory. so if you just want to check if a given password is present in your list or not, you can directly check by using

if Password in data:
     print "yes it is present in the list"
else:
    print "Not present in the list"

hope it may help.

Upvotes: 0

kfx
kfx

Reputation: 8537

Do not write your own binary search, it's a bit tricky to get them right. Use bisect module instead.

from bisect import bisect_left
def binary_search(lst, el):
   # returns lower bound of key `el` in list `lst`
   index = bisect_left(lst, el)
   # check that: (1) the lower bound is not at the end of the list and
   # (2) the element at the index matches `el`
   return index < len(lst) and lst[index] == el

Usage:

test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False

Upvotes: 3

Related Questions