In our web application we want to randomize the record IDs. The reason is because we want to hide how many entries there are in the DB already and we have unlisted things. In case IDs would be simple incremental numbers it would be easy to guess the IDs of unlisted things. As I see it there are three ways to do this: Simple Random Numbers Algorithm: Create a random number on insert. Check if the ID is already in use. If yes goto 1. Use this ID. Pro easy works with any size or type of the ID (32bit, 64bit, var. length, strings) Contra needs a transaction for possible race conditions (algorithm is not atomic) UUIDs Pro the likelihood of collisions is so low that you can ignore it Contra we want to have nice short URLs with the page title as an URL comment ( "#{id}--#{page_title} ), a UUID would shift this comment all the way to the right I guess UUIDs as primary keys would have a lower performance on joins? Encrypted IDs Algorithm: Read number from a sequence using nextval (atomic!) Encrypt the ID using a secret key and an encryption algorithm that works with the size used for the ID Pro no race conditions (no transactions necessary) Contra size of the ID column can never be changed if someone can crack/guess the key all was for nothing Time-stamps Suggested by @emboss Pro easy will not run out of IDs Contra might produce collisions (although one needs to test if it really happens) maybe somewhat guessable Random Public ID/Name Based Public ID Suggested by @viktor tron A second ID for all things that occur in an URL only used to find the record. Internally normal IDs are used (for joins etc). Pro internally everything stays sane a good random algorithm/the naming scheme should make URL guessing impossible (enough) Contra change a lot of things that use the IDs in public interfaces a user might expect that they can cut down URLs with such titles in them, but in this case the URLs won't work anymore I think I'll use the third option. Or are there more arguments against it? Is there an even better solution? We use Ruby on Rails 3.x and PostgreSQL 9.x. Edit: Unlisted does not mean private! It is meant like unlisted videos on YouTube. They are normal videos that just aren't listed in searches or the uploader's profile. So you can't really find them (without trying every possible ID), but everyone who knows the URL can access them. Of course a user that makes something unlisted and sends the link to someone else has to be aware that it might not stay unknown (the URL may be passed on and through linking might end up in a search engine). We also have another option to make things private. These are two different things. (I see that assuming that everyone knows what "unlisted" means was a mistake.)

ruby-on-railspostgresqlencryptionrandomuuid

Reputation: 7702

Randomize DB record IDs

In our web application we want to randomize the record IDs. The reason is because we want to hide how many entries there are in the DB already and we have unlisted things. In case IDs would be simple incremental numbers it would be easy to guess the IDs of unlisted things.

As I see it there are three ways to do this:

Simple Random Numbers

Algorithm:

Create a random number on insert.
Check if the ID is already in use. If yes goto 1.
Use this ID.

Pro

easy
works with any size or type of the ID (32bit, 64bit, var. length, strings)

Contra

needs a transaction for possible race conditions (algorithm is not atomic)

UUIDs

Pro

the likelihood of collisions is so low that you can ignore it

Contra

we want to have nice short URLs with the page title as an URL comment ("#{id}--#{page_title}), a UUID would shift this comment all the way to the right
I guess UUIDs as primary keys would have a lower performance on joins?

Encrypted IDs

Algorithm:

Read number from a sequence using nextval (atomic!)
Encrypt the ID using a secret key and an encryption algorithm that works with the size used for the ID

Pro

no race conditions (no transactions necessary)

Contra

size of the ID column can never be changed
if someone can crack/guess the key all was for nothing

Time-stamps

Suggested by @emboss

Pro

easy
will not run out of IDs

Contra

might produce collisions (although one needs to test if it really happens)
maybe somewhat guessable

Random Public ID/Name Based Public ID

Suggested by @viktor tron

A second ID for all things that occur in an URL only used to find the record. Internally normal IDs are used (for joins etc).

Pro

internally everything stays sane
a good random algorithm/the naming scheme should make URL guessing impossible (enough)

Contra

change a lot of things that use the IDs in public interfaces
a user might expect that they can cut down URLs with such titles in them, but in this case the URLs won't work anymore

I think I'll use the third option. Or are there more arguments against it? Is there an even better solution? We use Ruby on Rails 3.x and PostgreSQL 9.x.

Edit: Unlisted does not mean private! It is meant like unlisted videos on YouTube. They are normal videos that just aren't listed in searches or the uploader's profile. So you can't really find them (without trying every possible ID), but everyone who knows the URL can access them. Of course a user that makes something unlisted and sends the link to someone else has to be aware that it might not stay unknown (the URL may be passed on and through linking might end up in a search engine).

We also have another option to make things private. These are two different things. (I see that assuming that everyone knows what "unlisted" means was a mistake.)

Upvotes: 5

Answers (6)

Sergio Tulentsev

Reputation: 230411

Note: this answers the initial version of the question, from which it was not obvious that this is not a replacement for authorization logic.

This is a wrong solution to a wrong problem.

You think the problem is: users can guess ids of "unlisted" things and use them.

Actual problem is: users can access things without authorization.

Put authorization logic in place, allow user access only to items that he can legitimately access and forbid everything else.

Also

hide how many entries there are in the DB

I think there's no shame in being small, if this is the reason. Anyway, you can start your sequence from 100000 or increment it by N or employ another similar trick :)

Upvotes: 12

Divide

Reputation: 550

Shameless plug: https://github.com/dividedmind/pg_random_id

Just put the gem in, add migrations as per readme and you're done. This is based on scrambling a sequence, so there are guaranteed no collisions. You can have random integer or string ids.

Upvotes: 1

Viktor Trón

Reputation: 8894

I suggest a totally different way: simply do not show record IDs to users. You do not need to. Use another form of identification for url.

Since you say you want pretty urls, you could simply use a slugger/permalink gem, like https://github.com/norman/friendly_id

friendly_id's default slug generator offers functionality to check slug strings for uniqueness and, if necessary, appends a sequence to guarantee it.

Seriously, leave IDs alone :)

Upvotes: 3

emboss

Reputation: 39650

I think Sergio has given the perfect answer to your problem.

What you are trying to achieve is a good example of security by obscurity: instead of properly restricting access to certain unlisted items, you are trying to hide these items from people. But this still leaves the possibility to guess those hidden items, while access restriction makes it impossible to view a page one was not supposed to. And there's why access restriction is the clear winner: we have a 0 probability of viewing something we shouldn't have vs. a small probability of success. Even if it is negligible, 0 is always going to win over some value greater than zero.

I just wanted to add a few thoughts why your proposed solutions wouldn't work:

Random Numbers

Not using a SecureRandom here would already defeat the purpose. Using the normal rand makes the random numbers predictable, so anyone determined to "find" a hidden page would have a good chance in succeeding. But even if using a secure random number, you are only "spreading" your pages uniformly distributed in some range of numbers. The more records/pages eventually land in your app, the higher the probability will become that an attacker simply guessing randomly will finally succeed.

UUIDs

They are easily guessable once an attacker has found out how they are constructed. There's no security in their randomness as they are constructed following a deterministic scheme.

Encryption/Hashing

Using encryption is wrong here. It's wrong in the sense that it is invertible, and there'no need to, because that's what you are actually trying to prevent. Unless you are using authenticated encryption, the resulting ciphertext will be malleable, so there's a good chance an attacker could trick his way in on a forbidden page even without knowing the key that was used. Not to mention the numerous attacks they might try to recover the key. So a better solution would indeed be to use a secure hash function, properly randomized. Using a static salt isn't good enough for the same reasons it isn't good enough for passwords: a per-ID salt would be better to minimize the ability to precompute dictionaries. Precomputation is pretty easy, computing a table for ID's 1-100 with all sorts of salts is actually a promising strategy, since the attacker knows that it's sequential database IDs that were hashed here.

But no matter how hard you try, there's always a chance to get access by simply guessing. So to conclude with what has been said by Sergio already, what you actually need is authentication and an implementation of access restrictions.

If you want to hide the size of your data, why not try using a timestamp instead? In any case I would leave the database IDs untouched and add a special "display ID" column for what you want to display in your URLs, but leaving the original ID as the primary key.

Upvotes: 1

Veraticus

Reputation: 16064

While this is a laudable goal, changing the ID column away from an auto-incrementing integer is probably a mistake. When you get right down to it, the ID column should be for database use only. It's what allows the database to expose relations and ensure that records are findable separately from each other. You're trying to use the ID column to expose a piece of what I would consider business logic: that you want an essentially random reference number for your models. And when that business logic changes, you'll want to change the ID column, which will result in foreign keys being lost and will probably be an enormous headache.

To achieve this goal, you should make a new column, called something like "number", and implement one of these strategies on it. Then if you need to migrate to a new strategy, it'll be a lot easier for you to do so: and instead of doing Model.find(id), you'll just do Model.find_by_number(number).

Upvotes: 0

Nick

Reputation: 2413

Use a collision resistant hash function with some static salt along with the "internal ID" reference. For instance, SHA-256 will map elements in X to elements in H uniquely with low probability of collision; however, it's extremely hard (mathematically) to compute the element in X from an element in H.

In Ruby, do something like the following:

@hashed_id = Digest::SHA2.new << SHA_SALT << @foo.id

By the way, this isn't a form of encryption since anyone could generate the same hash given the same inputs without knowing a private key. It's also only a one-way function so there isn't a "decryption" algorithm either.

Upvotes: 2

Randomize DB record IDs

Simple Random Numbers

Pro

Contra

UUIDs

Pro

Contra

Encrypted IDs

Pro

Contra

Time-stamps

Pro

Contra

Random Public ID/Name Based Public ID

Pro

Contra

Answers (6)

This is a wrong solution to a wrong problem.

Also

Random Numbers

UUIDs

Encryption/Hashing

Related Questions