quarks
quarks

Reputation: 35282

Generating Hash codes with Google App Engine (GAE)

I need to design a way to provide a hash for every document stored in my application.

Using existing hash libraries (BCrypt, etc) and even BSON ObjectId generates nice "hash" or "key" but its quite long.

I also understand that the only way to achieve short hash, is to hash fewer strings (if not mistaken). Like hash Long id's staring from 0, 1, 2, 3 and so on.

However it is easy to think of, its fairly hard to implement in the Google App Engine (GAE) Datastore, or I haven't really crossed this need until now.

The GAE Datastore store entities across severs and even across datacenters and auto-increment ID is not really for this.

What could be the strategy to achieve this?

Upvotes: 2

Views: 838

Answers (1)

zengabor
zengabor

Reputation: 2023

As far as I understand you are looking for a way to generate short, unique, alphanumeric identifiers for you documents. The kind of thing URL shorteners do (see questions Making a short URL similar to TinyURL.com or What's the best way to create a short hash, similiar to what tiny Url does? or How to make unique short URL with Python?, etc.). My answer is based on this assumption.

The datastore generates unique auto-incremented IDs so you can rely on that. Multiple data centers are not a problem, your IDs will be unique, short (at least, initially) and there is no collision. This is probably how tinyurl and similar services accomplish it.

You can even request one or more unique IDs before you persist your new document in the datastore by using the DatastoreService.allocateIds(), for example:

KeyRange keyRange = dataService.allocateIds("MyDocumentModel", 1);
long uniqueId = keyRange.getStart().getId();

You can then "hash" this ID or you could get an even shorter alphanumeric ID by simply transcoding the integer ID to Base64 (or Base36 or some other base where you define your own characters, e.g., omitting vowels can help you avoid generating obvious swear words accidentally).

If predictability is an issue you can prefix/suffix this alphanumeric ID with some random characters.

Upvotes: 3

Related Questions