Sarreph
Sarreph

Reputation: 2015

Defining acceptable lexicographic similarity of Firestore document IDs

I've seen in the Firebase Firestore documentation's 'Best Practices' that you should:

Avoid high read or write rates to lexicographically close documents, or your application will experience contention errors.

An example given of how not to write document IDs is:

Customer1, Customer2, Customer3, ...


I'm mapping data from an external service into a Firestore collection, and I want to keep their original ID names. They are prefixed with entry_, but then suffixed with a random / unique string as follows:

entry_{Unique_String}, entry_{Unique_String}, ... entry_{Unique_String}

Does each document ID being prefixed with entry_, but followed by a random string, categorise the documents together as being lexographically close and therefore predisposed to hotspotting?

Or, would it only be classed as such if they were indeed named:

entry_1, entry_2, entry_3, entry_4 ... <and so on>

I could of course strip / add entry_ to the IDs when reading / writing, but this would add more complexity to the server / client.*

*Edit to clarify as per Alex Mamo's comment:

Complexity would increase due to the following examples:

Upvotes: 2

Views: 464

Answers (1)

Alex Mamo
Alex Mamo

Reputation: 138969

The scalability of this product comes from the fact that Firestore spreads the document out over its storage layer. In a simplified manner, sequential IDs have more hashing collisions, which means you can hit write limitations sooner. Having IDs that are more random ensures the writes are spread out evenly across the storage layer. I advise you not to use 1, 2, 3, or 4 as keys for your nodes or combinations of them. Using sequential IDs for that is an anti-pattern when it comes to Firestore since it will cause for sure scalability problems. So I strongly recommend you use those random document IDs.

For more information, I recommend you read Dan McGrath's answer from the following post:

Edit:

Those random ids prefixed with a constant as you showed in one of your comments can behave as they are in a sequential manner.

Why do I say that?

The built-in generator for unique ids that are used in Firestore when you call CollectionReference's add() methods or CollectionReference's document() method without passing any parameters, generates random and highly unpredictable ids, which prevents hitting certain hotspots in the backend infrastructure. Simply using a prefix with some random 6-digit numbers may increase that change. So the collisions of IDs in this case are most likely possible on a larger scale. Besides that, I recommend you check Frank van Puffelen's answer from this post, to see how those unique document IDs are generated. IMHO, you don't have to be concerned about those random document IDs generated by that algorithm in any way.

Upvotes: 5

Related Questions