Reputation: 148
I have a method that will generate 50,000 random strings, save them all to a file, and then run through the file, and delete all duplicates of the strings that occur. Out of those 50,000 random strings, after using set()
to generate unique ones, on average 63 of them are left.
Function to generate the strings:
def random_strings(size=8, chars=string.ascii_uppercase + string.digits + string.ascii_lowercase):
return ''.join(random.choice(chars) for _ in xrange(size))
Delete duplicates:
with open("dicts/temp_dict.txt", "a+") as data:
created = 0
while created != 50000:
string = random_strings()
data.write(string + "\n")
created += 1
sys.stdout.write("\rCreating password: {} out of 50000".format(created))
sys.stdout.flush()
print "\nRemoving duplicates.."
with open("dicts\\rainbow-dict.txt", "a+") as rewrite:
rewrite.writelines(set(data))
Example of before and after: https://gist.github.com/Ekultek/a760912b40cb32de5f5b3d2fc580b99f
How can I generate completely random unique strings without duplicates?
Upvotes: 2
Views: 667
Reputation: 3582
You can use set from the start
created = set()
while len(created) < 50000:
created.add(random_strings())
And save once outside the loop
Upvotes: 3
Reputation: 351359
You could guarantee unique strings by generating unique numbers, starting with a random number is a range that is 1/50000th of the total number of possibilities (628). Then generate more random numbers, each time determining the window in which the next number can be selected. This is not perfectly random, but I believe it's practically close enough.
Then these numbers can each be converted to strings by considering a representation of a 62-base number. Here is the code, and a test at the end to check that indeed all 50000 strings are unique:
import string
import random
def random_strings(count, size=8, chars=string.ascii_uppercase + string.digits + string.ascii_lowercase):
max = len(chars) ** size - 1
start = 0
choices = []
for i in range(0,count):
start = random.randint(start, start + (max-start) // (count-i))
digits = []
temp = start
while len(digits) < size:
temp, i = divmod(temp, len(chars))
digits.append(chars[i])
choices.append(''.join(digits))
start += 1
return choices
choices = random_strings(50000)
# optional shuffle, since they are produced in order of `chars`
random.shuffle(choices)
# Test: output how many distinct values there are:
print (len(set(choices)))
See it run on repl.it
This produces your strings in linear time. With the above parameters you'll have the answer within a second on the average PC.
Upvotes: 0