Reputation: 2800
I'm using MongoDB, and I would like to generate unique and cryptical IDs for blog posts (that will be used in restful URLS) such as s52ruf6wst or xR2ru286zjI.
What do you think is best and the more scalable way to generate these IDs ?
I was thinking of following architecture :
WDYT ?
Upvotes: 12
Views: 20313
Reputation: 1166
The "correct" answer, which is not really a great solution IMHO, is to generate a random ID, and then check the DB for a collision. If it is a collision, do it again. Repeat until you've found an unused match. Most of the time the first will work (assuming that your generation process is sufficiently random).
It should be noted that, this process is only necessary if you are concerned about the security implications of a time-based UUID, or a counter-based ID. Either of these will lead to "guessability", which may or may not be an issue in any given situation. I would consider a time-based or counter-based ID to be sufficient for blog posts, though I don't know the details of your situation and reasoning.
Upvotes: -1
Reputation: 601
This is an old question but for anyone who could be searching for another solution.
One way is to use simple and fast substitution cipher. (The code below is based on someone else's code -- I forgot where I took it from so cannot give proper credit.)
class Array
def shuffle_with_seed!(seed)
prng = (seed.nil?) ? Random.new() : Random.new(seed)
size = self.size
while size > 1
# random index
a = prng.rand(size)
# last index
b = size - 1
# switch last element with random element
self[a], self[b] = self[b], self[a]
# reduce size and do it again
size = b;
end
self
end
def shuffle_with_seed(seed)
self.dup.shuffle_with_seed!(seed)
end
end
class SubstitutionCipher
def initialize(seed)
normal = ('a'..'z').to_a + ('A'..'Z').to_a + ('0'..'9').to_a + [' ']
shuffled = normal.shuffle_with_seed(seed)
@map = normal.zip(shuffled).inject(:encrypt => {} , :decrypt => {}) do |hash,(a,b)|
hash[:encrypt][a] = b
hash[:decrypt][b] = a
hash
end
end
def encrypt(str)
str.split(//).map { |char| @map[:encrypt][char] || char }.join
end
def decrypt(str)
str.split(//).map { |char| @map[:decrypt][char] || char }.join
end
end
You use it like this:
MY_SECRET_SEED = 3429824
cipher = SubstitutionCipher.new(MY_SECRET_SEED)
id = hash["_id"].to_s
encrypted_id = cipher.encrypt(id)
decrypted_id = cipher.decrypt(encrypted_id)
Note that it'll only encrypt a-z, A-Z, 0-9 and a space leaving other chars intact. It's sufficient for BSON ids.
Upvotes: 0
Reputation: 2419
What about using UUIDs?
http://www.famkruithof.net/uuid/uuidgen as an example.
Upvotes: 3
Reputation: 27080
This is exactly why the developers of MongoDB constructed their ObjectID's (the _id) the way they did ... to scale across nodes, etc.
A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON. This is because they are compared byte-by-byte and we want to ensure a mostly increasing order. Here's the schema:
0123 456 78 91011
time machine pid inc
Traditional databases often use monotonically increasing sequence numbers for primary keys. In MongoDB, the preferred approach is to use Object IDs instead. Object IDs are more synergistic with sharding and distribution.
http://www.mongodb.org/display/DOCS/Object+IDs
So I'd say just use the ObjectID's
They are not that bad when converted to a string (these were inserted right after each other) ...
For example:
4d128b6ea794fc13a8000001
4d128e88a794fc13a8000002
They look at first glance to be "guessable" but they really aren't that easy to guess ...
4d128 b6e a794fc13a8000001
4d128 e88 a794fc13a8000002
And for a blog, I don't think it's that big of a deal ... we use it production all over the place.
Upvotes: 32
Reputation: 7619
Make a web service that returns a globally-unique ID so that you can have many webservers participate and know you won't hit any duplicates?
If your daily batch didn't allocate enough items? Do you run it midday?
I would implement the web-service client as a queue that can be looked at by a local process and refilled as needed (when server is slower) and could keep enough items in queue not to need to run during peak usage. Makes sense?
Upvotes: 1