Reputation: 79278
Say you want to have a set of 1- to 2-digit hexadecimal numbers, so 256 numbers. Just using a small set to get at the problem, but it would work with any sized string.
So you have a potential N
or 256 numbers in this case. You are going to "generate" a new ID for every new data record that comes your way. So it starts of and randomly gives you af
, then 1d
, then 8a
, etc.
The straightforward naïve way to do this is to just simply generate all the numbers in order, then shuffle them, and just pop from the set. This works fine when you only have 256 numbers. But if you have millions or billions of numbers it is impractical as you might have a lot of waisted generated IDs not being used for long periods of time. I would like to avoid this.
So my question is what is the or a fastest way to create a unique key string like this, without generating all of them in advance, and without going in order just incrementing by 1 or whatnot. That is, the key should be seemingly random.
One way I can imagine is using a trie to store the already used/generated values. Then when you are to get a new value, you generate a random one, then check the trie to see if it's already used. I have no idea how to tell how efficient this is though, but it seems like it would be very bad performing once you start running out of IDs and are down to the last few ones in the set. You would generate lots of already generated IDs, and traverse the trie for each, so it would be slow.
I am wondering if there is a more efficient way of doing this, without generating them all in advance. Also, the data records won't be used in figuring out the ID, as the records might be extremely large and complex.
Maybe there is a way to sort of randomly traverse (and generate) a trie at once, and in that way generate the ID since you end up at a unique random place in the trie. Something along those lines perhaps, I don't know.
Also, I am not sophisticated with hashing so I don't know if there would be any good methods with that.
Upvotes: 3
Views: 2272
Reputation: 241771
I assume that you could generate sequential IDs; that is, that you have a reliable way of knowing exactly how many IDs have been generated to date. Then it is sufficient to encrypt this count with any reasonably fast encryption algorithm.
The encryption would be done on the count as a binary number, and the encrypted result with most algorithms would be the same size, also binary. If desired, you could base-64 or hex encode the result to make it easier to use as a character string.
Since encryption must be a bijection (that is, a one-to-one mapping) in order for decryption to be possible, this is guaranteed to produce a different result each time until the total ID count overflows. If it is a reasonable encryption function, then the result will appear random (otherwise the cipher would be vulnerable).
Upvotes: 2
Reputation: 79278
I am considering something like this:
var trie = buildTrie()
var id1 = genId(trie)
var id2 = genId(trie)
console.log(id1,id2)
function buildTrie() {
var trie = buildNode(0)
return trie
function buildNode(level) {
if (level == 7) { // 8 bits
var node = {
available: true,
leaf: true
}
return node
} else {
var a = buildNode(level + 1)
var b = buildNode(level + 1)
var node = {
availableLeft: true,
availableRight: true,
left: a,
right: b
}
a.parent = node
b.parent = node
return node
}
}
}
function genId(node) {
var bytes = []
step(node, bytes)
var id = parseInt(bytes.join(''), 2).toString(16)
return id
function step(node, bytes) {
if (node.leaf) {
node.available = false
var c = node
var p = c.parent
while (p) {
if (p.left == c) {
p.availableLeft = false
} else if (p.right == c) {
p.availableRight = false
}
if (!p.availableLeft && !p.availableRight) {
c = p
p = p.parent
} else {
p = false
}
}
}
var randomDirection = Math.random() >= 0.5
if (randomDirection) {
if (node.availableLeft) {
bytes.push(0)
step(node.left, bytes)
} else if (node.availableRight) {
bytes.push(1)
step(node.right, bytes)
}
} else {
if (node.availableRight) {
bytes.push(1)
step(node.right, bytes)
} else if (node.availableLeft) {
bytes.push(0)
step(node.left, bytes)
}
}
}
}
Maybe there is a better way.
Upvotes: 0
Reputation: 37755
I am not sure how performant it will be but my idea is use a object
or Map
and Math.random()
let obj = {}
function generateRandomId(){
let id = Math.abs( 0.5 - Math.random()) * 1000
if(obj[id]){
generateRandomId()
} else {
obj[id] = true
}
return id
}
console.log(generateRandomId())
console.log(generateRandomId())
console.log(generateRandomId())
console.log(generateRandomId())
But if you are ok with using a modules i find this one is most useful
uuid this generates RFC4122 UUIDS.
Upvotes: 1
Reputation: 10390
I think a mixing function is what you want. It will move bits around in your input to produce an output of the same length. It's reversible so each input corresponds to a unique output.
Since you want the input data to not take part in the id generation, you'll need a surrogate id. You can assign an incrementing id to each record and use the mix function to scramble the id.
You will get something like:
id == 1
=> mixed id == 0x7ed55d16
id == 2
=> mixed id == 0xc761c23c
See here for some inspiration:
Upvotes: 0
Reputation: 5914
I think that there should be some tradeoff between speed, flexibility and efficiency.
On one had pseudo random generators will give you that even distribution of keys and will be reasonably fast to generate. However checking for an existing id would be slow. You can use bloom filters (saving memory) or tries but then as you said at some point you should increase the space.
Another option is to use Gray code which will produce every key (but not at random order). You need to keep track of the last issued code.
Upvotes: 0