Defining acceptable lexicographic similarity of Firestore document IDs

Question

I've seen in the Firebase Firestore documentation's 'Best Practices' that you should:

Avoid high read or write rates to lexicographically close documents, or your application will experience contention errors.

An example given of how not to write document IDs is:

Customer1, Customer2, Customer3, ...

I'm mapping data from an external service into a Firestore collection, and I want to keep their original ID names. They are prefixed with entry_, but then suffixed with a random / unique string as follows:

entry_{Unique_String}, entry_{Unique_String}, ... entry_{Unique_String}

Does each document ID being prefixed with entry_, but followed by a random string, categorise the documents together as being lexographically close and therefore predisposed to hotspotting?

Or, would it only be classed as such if they were indeed named:

entry_1, entry_2, entry_3, entry_4 ...

I could of course strip / add entry_ to the IDs when reading / writing, but this would add more complexity to the server / client.*

*Edit to clarify as per Alex Mamo's comment:

Complexity would increase due to the following examples:

Introduction of strip / prepend "entry_" function wherever docs are being read / written in context of original dataset or need to be sent back to external service.
May require creation of document fields to track (e.g. type = "entry") where multiple categories of document ID are used in the same collection -- This may not be a disadvantage depending on use-case, e.g. if performing type comparisons.
Tedious to reimplement the above for other category types (e.g. foo_, bar_) that originate from the same external service, with the same prefixed unique strings.

Alex Mamo · Accepted Answer

The scalability of this product comes from the fact that Firestore spreads the document out over its storage layer. In a simplified manner, sequential IDs have more hashing collisions, which means you can hit write limitations sooner. Having IDs that are more random ensures the writes are spread out evenly across the storage layer. I advise you not to use 1, 2, 3, or 4 as keys for your nodes or combinations of them. Using sequential IDs for that is an anti-pattern when it comes to Firestore since it will cause for sure scalability problems. So I strongly recommend you use those random document IDs.

For more information, I recommend you read Dan McGrath's answer from the following post:

Limitations of using sequential IDs in Cloud Firestore

Edit:

Those random ids prefixed with a constant as you showed in one of your comments can behave as they are in a sequential manner.

Why do I say that?

The built-in generator for unique ids that are used in Firestore when you call CollectionReference's add() methods or CollectionReference's document() method without passing any parameters, generates random and highly unpredictable ids, which prevents hitting certain hotspots in the backend infrastructure. Simply using a prefix with some random 6-digit numbers may increase that change. So the collisions of IDs in this case are most likely possible on a larger scale. Besides that, I recommend you check Frank van Puffelen's answer from this post, to see how those unique document IDs are generated. IMHO, you don't have to be concerned about those random document IDs generated by that algorithm in any way.

Defining acceptable lexicographic similarity of Firestore document IDs

Answers (1)

Related Questions