Reputation: 899
I need to create unique and random alphanumeric ID's of a set length. Ideally I would store a counter in my database starting at 0, and every time I need a unique ID I would get the counter value (0), run it through this hashing function giving it a set length (Probably 4-6 characters) [ID = Hash(Counter, 4);], it would return my new ID (ex. 7HU9), and then I would increment my counter (0++ = 1).
I need to keep the ID's short so they can be remembered or shared easily. Security isn't a big issue, so I'm not worried about people trying random ID's, but I don't want the ID's to be predictable, so there can't be an opportunity for a user to notice that the ID's increment by 3 every time allowing them to just work their way backwards through the ID's and download the ID data one-by-one (ex. A5F9, A5F6, A5F3, A5F0 == BAD).
I don't want to just loop through random strings checking for uniqueness since this would increase database load over time as key's are used up. The intention is that hashing a unique incrementing counter would guarantee ID uniqueness up to a certain counter value, at which point the length of the generated ID's would be increased by one and the counter reset, and continue this pattern forever.
Does anybody know of any hashing functions which would suit this need, or have any other ideas?
Edit: I do not need to be able to reverse the function to get the counter value back.
Upvotes: 0
Views: 469
Reputation: 899
(This was a while ago but I should write up what I ended up doing...)
The idea I came up with was actually pretty simple. I wanted alphanumeric pins, so that works out to 36 potential characters for each character, and I wanted to start with 4 character pins so that works out to 36^4 = 1,679,616 possible pins. I realized that all I wanted to do was take all of these possible pins and throw away a percentage of them in a random way such that a human being had a low chance of randomly finding one. So I divide 1,679,616 by 100 and then multiply my counter by a random number between 1 and 100 and then encode that number as my alphanumeric pin. Problem solved!
By guessing a random combination of 4 letters and numbers you have a 1 in 100 chance of actually guessing a real in-use pin, which is all I really wanted. In my implementation I increment the pin length once the available pin space is exhausted, and everything worked perfectly! Been running for about 2 years now!
Upvotes: 0
Reputation: 733
Let's say that your counter is range from 1 to 10000. Slice [1, 10000] to 10 small unit, each unit contain 1000 number.These small unit will keep track of their last id.
unit-1 unit-2 unit-10
[1 1000], [1001, 2000], ... ,[9000, 10000]
When you need a ID, just random select from unit 1-10, and get the unit's newest ID. e.g At first, your counter is 1, random selection is unit-2, than you will get the ID=1001; Second time, your counter is 2, random selection is unit-1, than you will get the ID=1; Third time, your counter is 3, random selection is unit-2, than you will get the ID=1002; ...and so on.
Upvotes: 1
Reputation: 46970
The tough part, as you realize, is getting to a no-collision sequence guaranteed.
If "not obvious" is the standard you need for guessing the algorithm, a simple mixed congruential RNG of full period - or rather a sequence of them with increasing modulus to satisfy the requirement for growth over time - might be what you want. This is not the hash approach you're asking for, but it ought to work.
This presentation covers the basics of MCRNGs and sufficient conditions for full period in a very concise form. There are many others.
You'd first use the lowest modulus MCRNG starting with an arbitrary seed until you've "used up" its cycle and then advance to the next largest modulus.
You will want to "step" the moduli to ensure uniqueness. For example if your first IDs are 12 bits and so you have a modulus M1 <= 2^12 (but not much less than), then you advance to 16 bits, you'd want to pick the second modulus M2 <= 2^16 - M1. So the second tier of id's would be M1+x_i where x_i is the i'th output of the second rng. A 32-bit third tier would have modulus 2^32-M2 and its output would be be M2+y_i, where y_i is its output, etc.
The only persistent storage required will be the last ID generated and the index of the MCRNG in the sequence.
Someone with time on their hands could guess this algorithm without too much trouble. But a casual user would be unlikely to do so.
Upvotes: 1