Reputation:
I am trying to get a binary search to work in Python. I have a massive, sorted list of passwords. The plan is to get a password input from the user and see if it is in the list. I've decided to implement a binary search because of the size of the list.
Here's my code:
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = (int(len(data))+1)
while (low < high) and not Found:
mid = int((low+high)/2)
if data[mid] == Password:
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1
I am guessing it is because of the string comparison? Any ideas? The binary search never returns true, even if I explicitly search something that I know is in the list.
I used this code to sort the password list.
import io
with io.open('result.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
def partition(data, start, end):
pivot = data[end] # Partition around the last value
bottom = start-1 # Start outside the area to be partitioned
top = end # Ditto
done = 0
while not done: # Until all elements are partitioned...
while not done: # Until we find an out of place element...
bottom = bottom+1 # ... move the bottom up.
if bottom == top: # If we hit the top...
done = 1 # ... we are done.
break
if data[bottom] > pivot: # Is the bottom out of place?
data[top] = data[bottom] # Then put it at the top...
break # ... and start searching from the top.
while not done: # Until we find an out of place element...
top = top-1 # ... move the top down.
if top == bottom: # If we hit the bottom...
done = 1 # ... we are done.
break
if data[top] < pivot: # Is the top out of place?
data[bottom] = data[top] # Then put it at the bottom...
break # ...and start searching from the bottom.
data[top] = pivot # Put the pivot in its place.
return top # Return the split point
def quicksort(data, start, end):
if start < end: # If there are two or more elements...
split = partition(data, start, end) # ... partition the sublist...
quicksort(data, start, split-1)
quicksort(data, split+1, end)
quicksort(data, 0, (int(len(data))-1))
with io.open('final.txt', 'w', encoding='latin-1') as f:
for s in data:
f.write(s)
The sorted list looks something like this: whitespace, then symbols, then numbers, then capital letters (alphabetically sorted), then common letters (alphabetically sorted).
Upvotes: 0
Views: 1145
Reputation: 4134
You're skipping parts of your list because of the way you're setting low
and high
. Because of this, low == high
occurs after updating and before checking, causing you to jump out of the loop prematurely.
There are two easy solutions:
Either..
high = mid
or low = mid
instead of mid -/+ 1
, triggering an extra iteration,or..
high == low and data[low] == Password
after the loop
terminates, as you might still find Password
there.Upvotes: 0
Reputation: 1220
There are two problems .
The repeat condition should be
while (low <= high)
or your can't find the first and the last element .
\n
but user_input() does not .Which causes `Password` == `Password\n'
be false forever.
Upvotes: 0
Reputation: 8164
This is example of binary search
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))
mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))
I got then
False
False
True
True
Yes and I am sure that you need new line character at the end of each password after calling readlines,as Eamon pointed out.
Upvotes: 0
Reputation: 8986
You probably have a new line character at the end of each password after calling readlines
, use rstrip()
to remove it
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = len(data)-1 #no need to cast to int, should be len()-1
while (low <= high) and not Found: #less than or equal to
mid = int((low+high)/2)
if data[mid].rstrip() == Password: #Remove newline character before compare
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1
Upvotes: 0
Reputation: 3720
If you only want to search the password in your list then In your code
data = myfile.readlines()
you have already taken all the passwords into the memory. so if you just want to check if a given password is present in your list or not, you can directly check by using
if Password in data:
print "yes it is present in the list"
else:
print "Not present in the list"
hope it may help.
Upvotes: 0
Reputation: 8537
Do not write your own binary search, it's a bit tricky to get them right. Use bisect
module instead.
from bisect import bisect_left
def binary_search(lst, el):
# returns lower bound of key `el` in list `lst`
index = bisect_left(lst, el)
# check that: (1) the lower bound is not at the end of the list and
# (2) the element at the index matches `el`
return index < len(lst) and lst[index] == el
Usage:
test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False
Upvotes: 3