Jimmy Kane
Jimmy Kane

Reputation: 16845

How to create a Firestore safe Document ID based on a string

After some discussion on my question about base64 not being safe for Firestore IDs here I would like to know how one can encode a string to a Firestore "safe" Document ID.

Here is the problem:

I asked about the base64 in another question and that is not safe as it contains /

So what could be a safe way to encode that string without loosing the entropy of the username that the external service provides. That means that there could be a username such as dimi/test1 and another as dimitest1 so just stripping out characters is not an option.

Also since that service has available an open API and my service exposes the document ID's via URLs I would like not to expose the other service usernames via my apps URLS.

Any suggestions?

Upvotes: 1

Views: 2001

Answers (3)

Ignacio Bustos
Ignacio Bustos

Reputation: 1504

EDIT

In order to transform strings into uniqueIDs very fast use crypto.createHash() instead. The result will be the same for a given string input.

You can use MD5 or SHA256 as both takes the same time, 2.2s average to calculate 1 Million unique IDs.

Here is the code:

const crypto = require('crypto');

function uniqueId(string, algorithm = 'md5') {
  return crypto.createHash(algorithm).update(string).digest('hex');
}

console.log('started');
console.time('generateIDsMD5')
for (let i = 0; i < 1000000; i++) {
  uniqueId('a string ' + i);
}
console.timeEnd('generateIDsMD5');

console.time('generateIDsSHA256')
for (let i = 0; i < 1000000; i++) {
  uniqueId('a string ' + i, 'sha256');
}
console.timeEnd('generateIDsSHA256');

// For instance, It will take around 2.2s average
// to generate 1Million Unique IDs with MD5 or SHA256 encryption

console.log('MD5 string ', uniqueId('a string ' + 1));
console.log('MD5 sameString ', uniqueId('a string ' + 2));
console.log('MD5 sameString ', uniqueId('a string ' + 2));
console.log('SHA256 string ', uniqueId('a string ' + 1, 'sha256'));
console.log('SHA256 sameString ', uniqueId('a string ' + 2, 'sha256'));
console.log('SHA256 sameString ', uniqueId('a string ' + 2, 'sha256'));
console.log('finished');

PREVIOUS ANSWER

I adapted the code from Firebase and made it available directly on your node.js with some custom test for you. It takes up to 3s for 1 Million IDs, and only 300ms for 100.000 IDs which is your considered daily usage approach.

This uses crypto considered very safe if run in node.js environment.

here is the function wrapped with usage example:

const crypto = require('crypto');

function autoId(bytesLength) {
  const chars =
    'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  let autoId = '';
  while (autoId.length < bytesLength) {
    const bytes = crypto.randomBytes(40);
    bytes.forEach(b => {
      // Length of `chars` is 62. We only take bytes between 0 and 62*4-1
      // (both inclusive). The value is then evenly mapped to indices of `char`
      // via a modulo operation.
      const maxValue = 62 * 4 - 1;
      if (autoId.length < bytesLength && b <= maxValue) {
        autoId += chars.charAt(b % 62);
      }
    });
  }
  return autoId;
}

console.log('started');
console.time('generateIDs')
for (let i = 0; i < 1000000; i++) {
  autoId(20);
}
console.timeEnd('generateIDs');
// For instance, It will take around 3s average
// to generate 1 Million Unique IDs with 20 bytes length

console.log('example 20bytes ', autoId(20));
console.log('example 40bytes ', autoId(40));
console.log('example 60bytes ', autoId(60));
console.log('finished');

Simply use node thisfile.js and you will see your result.

Since firebase is mainly open source we can find the official uniqueId generator used in node.js to generate the IDs here: https://github.com/googleapis/nodejs-firestore/blob/4f4574afaa8cf817d06b5965492791c2eff01ed5/dev/src/util.ts#L52

IMPORTANT

If you are going to join 2 IDs, do not use any slash /, as you know it is not allowed, instead use underscore _ or nothing at all since you have control of the length of an ID, therefore you should know how to split the ID accordingly (40 bytes contain 2 IDs of 20 bytes for instance).

The limitation of firestore in Document Ids is 1500 bytes so you have plenty to play with.

More info: https://firebase.google.com/docs/firestore/quotas#limits

Upvotes: 2

Jek
Jek

Reputation: 5666

Use encodeURI() followed by SHA256. This will constraint the document ID to

Must be valid UTF-8 characters
Must be no longer than 1,500 bytes
Cannot contain a forward slash (/)
Cannot solely consist of a single period (.) or double periods (..)
Cannot match the regular expression __.*__

encodeURI is for valid UTF-8 characters.

SHA256 is fixed length at 256 bits (or 32 bytes) therefore not exceeding 1,500 bytes limit.

SHA256 characters are [a-fA-F0-9] according to https://stackoverflow.com/a/12618366/3073280.

Lastly, you mentioned that it will need entropy. SHA256 is well diffused.

Upvotes: 2

Jimmy Kane
Jimmy Kane

Reputation: 16845

I used Base58 and that was the most safe I could research for

Upvotes: -1

Related Questions