Reputation: 18353
I need to generate string that meets the following requirements:
I will store them in a data base after generation (they will be assigned to other entities).
My intention is to do something like this:
My concern with regard to that algorithm is that it doesn't guarantee a result in finite time (if there are already A LOT of values in the data base).
Question: could you please give advice on how to improve this algorithm to be more deterministic?
Thanks.
Upvotes: 4
Views: 6011
Reputation: 28738
The problem with your approach is clearly that while you have few records, you are very unlikely to get collisions but as your number of records grows the chance will increase until it becomes more likely than not that you'll get a collision. Eventually you will be hitting multiple collisions before you get a 'valid' result. Every time will require a table scan to determine if the code is valid, and the whole thing turns into a mess.
The simplest solution is to precalculate your codes.
Start with the first code 00AAAA, and increment to generate 00AAAB, 00AAAC ... 99ZZZZ. Insert them into a table in random order. When you need a new code, retrieve to top record unused record from the table (then mark it as used). It's not a huge table, as pointed out above - only a few million records.
If you ever need more 'codes', just generate some more 'random' strings and append them to the table.
Upvotes: 0
Reputation: 189876
- it should be unique string;
- string length should be 8 characters;
- it should contains 2 digits;
- all symbols (non-digital characters) - should be upper case.
Assuming:
Then your proposed method has two issues. One is that the letters A - Z are ASCII 65 - 90, not 64 - 89. The other is that it doesn't distribute the numbers evenly within the possible string space. That can be remedied by doing the following:
There are 28 possibilities for the two different integers ((8*8 - 8 duplicates) / 2 orderings), 266 possibilities for the letters, and 100 possibilities for the numbers, the total # of valid combinations being Ncomb = 864964172800 = 8.64 x 1011.
edit: If you want to avoid the database for storage, but still guarantee both uniqueness of strings and have them be cryptographically secure, your best bet is a cryptographically random bijection from a counter between 0 and Nmax <= Ncomb to a subset of the space of possible output strings. (Bijection meaning there is a one-to-one correspondence between the output string and the input counter.)
This is possible with Feistel networks, which are commonly used in hash functions and symmetric cryptography (including AES). You'd probably want to choose Nmax = 239 which is the largest power of 2 <= Ncomb, and use a 39-bit Feistel network, using a constant key you keep secret. You then plug in your counter to the Feistel network, and out comes another 39-bit number X, which you then transform into the corresponding string as follows:
Alternatively, use 40-bit numbers, and if the output of your Feistel network is > Ncomb, then increment the counter and try again. This covers the entire string space at the cost of rejecting invalid numbers and having to re-execute the algorithm. (But you don't need a database to do this.)
But this isn't something to get into unless you know what you're doing.
Upvotes: 6
Reputation: 68046
For one thing, your list of requirements doesn't state that string has to be necessary random, so you might consider something like database index.
If 'random' is a requirement, you can do a few improvements.
E.g., if we have sequence 1, 2, 3, 4, ... and use cyclic binary shift right by 1 bit, it'll be turned into 4, 1, 5, 2, ... (assuming we have 3 bits only) It doesn't have to be a shift too, it can be a permutation or any other 'randomization'.
Upvotes: 0
Reputation: 49804
Do it the other way around: generate one big random number that you will split up to obtain the individual characters:
long bigrandom = ...;
int firstDigit = bigRandom % 10;
int secondDigit = ( bigrandom / 10 ) % 10;
and so on.
Then you only store the random number in your database and not the string. Since there's a one-to-one relationship between the string and the number, this doesn't really make a difference.
However, when you try to insert a new value, and it's already in the databse, you can easily find the smallest unallocated number graeter than the originally generated number, and use that instead of the one you generated.
What you gain from this method is that you're guaranteed to find an available code relatively quickly, even when most codes are already allocated.
Upvotes: 0
Reputation: 17124
Are these user passwords? If so, there are a couple of things you need to take into account:
As far as 2 is concerned, you can avoid the problem by using LLNLLNLL as your pattern (L = letter, N = number).
If you need 1 million passwords out of a pool of 2.5 billion, you will certainly get clashes in your database, so you have to deal with them gracefully. But a simple retry is enough, if your random number generator is robust.
Upvotes: 2
Reputation: 67524
I think you're safe well into your tens of thousands of such ID's, and even after that you're most likely alright.
Now if you want some determinism, you can always force a password after a certain number of failures. Say after 50 failures, you select a password at random and increment a part of it by 1 until you get a free one.
I'm willing to bet money though that you'll never see the extra functionality kick in during your life time :)
Upvotes: 0
Reputation: 35828
I don't see anything in your requirements that states that the string needs to be random. You could just do something like the following pseudocode:
for letters in ( 'AAAAAA' .. 'ZZZZZZ' ) {
for numbers in ( 00 .. 99 ) {
string = letters + numbers
}
}
This will create unique strings eight characters long, with two digits and six upper-case letters.
If you need randomly-generated strings, then you need to keep some kind of record of which strings have been previously generated, so you're going to have to hit a DB (or keep them all in memory, or write them to a textfile) and check against that list.
Upvotes: 0