Chris
Chris

Reputation: 2800

MongoDB custom and unique IDs

I'm using MongoDB, and I would like to generate unique and cryptical IDs for blog posts (that will be used in restful URLS) such as s52ruf6wst or xR2ru286zjI.

What do you think is best and the more scalable way to generate these IDs ?

I was thinking of following architecture :

WDYT ?

Upvotes: 12

Views: 20313

Answers (5)

The Busy Wizard
The Busy Wizard

Reputation: 1166

The "correct" answer, which is not really a great solution IMHO, is to generate a random ID, and then check the DB for a collision. If it is a collision, do it again. Repeat until you've found an unused match. Most of the time the first will work (assuming that your generation process is sufficiently random).

It should be noted that, this process is only necessary if you are concerned about the security implications of a time-based UUID, or a counter-based ID. Either of these will lead to "guessability", which may or may not be an issue in any given situation. I would consider a time-based or counter-based ID to be sufficient for blog posts, though I don't know the details of your situation and reasoning.

Upvotes: -1

Marcin Bilski
Marcin Bilski

Reputation: 601

This is an old question but for anyone who could be searching for another solution.

One way is to use simple and fast substitution cipher. (The code below is based on someone else's code -- I forgot where I took it from so cannot give proper credit.)

class Array
  def shuffle_with_seed!(seed)
    prng = (seed.nil?) ? Random.new() : Random.new(seed)
    size = self.size

    while size > 1
      # random index
      a = prng.rand(size)

      # last index
      b = size - 1

      # switch last element with random element
      self[a], self[b] = self[b], self[a]

      # reduce size and do it again
      size = b;
    end

    self
  end

  def shuffle_with_seed(seed)
    self.dup.shuffle_with_seed!(seed)  
  end
end

class SubstitutionCipher

  def initialize(seed)
    normal = ('a'..'z').to_a + ('A'..'Z').to_a + ('0'..'9').to_a + [' ']
    shuffled = normal.shuffle_with_seed(seed)
    @map = normal.zip(shuffled).inject(:encrypt => {} , :decrypt => {}) do |hash,(a,b)|
      hash[:encrypt][a] = b
      hash[:decrypt][b] = a
      hash
    end
  end

  def encrypt(str)
    str.split(//).map { |char| @map[:encrypt][char] || char }.join
  end

  def decrypt(str)
    str.split(//).map { |char| @map[:decrypt][char] || char }.join
  end

end

You use it like this:

MY_SECRET_SEED = 3429824

cipher = SubstitutionCipher.new(MY_SECRET_SEED)

id = hash["_id"].to_s
encrypted_id = cipher.encrypt(id)
decrypted_id = cipher.decrypt(encrypted_id)

Note that it'll only encrypt a-z, A-Z, 0-9 and a space leaving other chars intact. It's sufficient for BSON ids.

Upvotes: 0

nilfalse
nilfalse

Reputation: 2419

What about using UUIDs?

http://www.famkruithof.net/uuid/uuidgen as an example.

Upvotes: 3

Justin Jenkins
Justin Jenkins

Reputation: 27080

This is exactly why the developers of MongoDB constructed their ObjectID's (the _id) the way they did ... to scale across nodes, etc.

A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON. This is because they are compared byte-by-byte and we want to ensure a mostly increasing order. Here's the schema:

0123   456      78    91011
time   machine  pid   inc

Traditional databases often use monotonically increasing sequence numbers for primary keys. In MongoDB, the preferred approach is to use Object IDs instead. Object IDs are more synergistic with sharding and distribution.

http://www.mongodb.org/display/DOCS/Object+IDs

So I'd say just use the ObjectID's

They are not that bad when converted to a string (these were inserted right after each other) ...

For example:

4d128b6ea794fc13a8000001
4d128e88a794fc13a8000002

They look at first glance to be "guessable" but they really aren't that easy to guess ...

4d128 b6e a794fc13a8000001
4d128 e88 a794fc13a8000002

And for a blog, I don't think it's that big of a deal ... we use it production all over the place.

Upvotes: 32

Christopher Mahan
Christopher Mahan

Reputation: 7619

Make a web service that returns a globally-unique ID so that you can have many webservers participate and know you won't hit any duplicates?

If your daily batch didn't allocate enough items? Do you run it midday?

I would implement the web-service client as a queue that can be looked at by a local process and refilled as needed (when server is slower) and could keep enough items in queue not to need to run during peak usage. Makes sense?

Upvotes: 1

Related Questions