kuzdu
kuzdu

Reputation: 7524

Generated unique id with 6 characters - handling when too much ids already used

​In my program you can book an item. This item has an id with 6 characters from 32 possible characters.

So my possibilities are 32^6. Every id must be unique.

func tryToAddItem {
 if !db.contains(generateId()) {
   addItem()
 } else {
   tryToAddItem()
 }
}

For example 90% of my ids are used. So the probability that I call tryToAddItem 5 times is 0,9^5 * 100 = 59% isn't it?

So that is quite high. This are 5 database queries on a lot of datas.

When the probability is so high I want to implement a prefix „A-xxxxxx“. What is a good condition for that? At which time do I will need a prefix?

In my example 90% ids were use. What is about the rest? Do I threw it away? What is about database performance when I call tryToAddItem 5 times? I could imagine that this is not best practise.

Upvotes: 1

Views: 49

Answers (2)

kuzdu
kuzdu

Reputation: 7524

Thanks for response. I searched for alternatives and want show three possibilities.

First possibility: Create an UpcomingItemIdTable with 200 (more or less) valid itemIds. A task in the background can calculate them every minute (or what you need). So the action tryToAddItem will always get a valid itemId.

Second possibility

Is remapping all ids to a larger space one time only a possibility?

In my case yes. I think for other problems the answer will be: it depends.

Third possibility: Try to generate an itemId and when there is a collision try it again.

Possible collisions handling: Do some test before. Measure the time to generate itemIds when there are already 1000,10.000,100.000,1.000.000 etc. entries in the table. When the tryToAddItem method needs more than 100ms (or what you prefer) then increase your length from 6 to 7,8,9 characters.

Some thoughts

  • every request must be atomar
  • create an index on itemId
  • Disadvantages for long UUIDs in API: See https://zalando.github.io/restful-api-guidelines/#144

    less usable, because...
    -cannot be memorized and easily communicated by humans
    -harder to use in debugging and logging analysis
    -less convenient for consumer facing usage
    -quite long: readable representation requires 36 characters and comes with higher memory and bandwidth consumption
    -not ordered along their creation history and no indication of used id volume
    -may be in conflict with additional backward compatibility support of legacy ids [...]

TLDR: For my case every possibility is working. As so often it depends on the problem. Thanks for input.

Upvotes: 0

Imran
Imran

Reputation: 13458

For example 90% of my ids are used. So the probability that I call tryToAddItem 5 times is 0,9^5 * 100 = 59% isn't it?

Not quite. Let's represent the number of call you make with the random variable X, and let's call the probability of an id collision p. You want the probability that you make the call at most five times, or in general at most k times:

P(X≤k) = P(X=1) + P(X=2) + ... + P(X=k)
= (1-p) + (1-p)*p + (1-p)*p^2 +... + (1-p)*p^(k-1)
= (1-p)*(1 + p + p^2 + .. + p^(k-1))

If we expand this out all but two terms cancel and we get:

= 1- p^k

Which we want to be greater than some probability, x:

1 - p^k > x

Or with p in terms of k and x:

p < (1-x)^(1/k)

where you can adjust x and k for your specific needs.

If you want less than a 50% probability of 5 or more calls, then no more than (1-0.5)^(1/5) ≈ 87% of your ids should be taken.

First of all make sure there is an index on the id columns you are looking up. Then I would recommend thinking more in terms of setting a very low probability of a very bad event occurring. For example maybe making 20 calls slows down the database for too long, so we'd like to set the probability of this occurring to <0.1%. Using the formula above we find that no more than 70% of ids should be taken.

But you should also consider alternative solutions. Is remapping all ids to a larger space one time only a possibility?

Or if adding ids with prefixes is not a big deal then you could generate longer ids with prefixes for all new items going forward and not have to worry about collisions.

Upvotes: 1

Related Questions