Dealing with race conditions and starvation when generating unique IDs using MongoDB + NodeJS

I am using MongoDB to generate unique IDs of this format:

{ID TYPE}{ZONE}{ALPHABET}{YY}{XXXX}

Here ID TYPE will be an alphabet from {U, E, V} depending on the input, zone will be from the set {N, S, E, W}, YY will be the last 2 digits of the current year and XXXXX will be a 5 digit number beginning from 0 (willbe padded with 0s to make it 5 digits long). When XXXXX reaches 99999, the ALPHABET part will be incremented to the next alphabet (starting from A).

I will receive ID TYPE and ZONE as input and will have to give the generated unique ID as output. Everytime, I have to generate a new ID, I will read the last generated for the given ID TYPE and ZONE, increment the number part by 1 (XXXXX + 1) and then save the new generated ID in MongoDB and return the output to the user.

This code will be run on a single NodeJS server and there can be multiple clients calling this method Is there a possibility of a race condition like the once described below if I am ony running a single server instance:

  1. First client reads last generated ID as USA2100000
  2. Second client reads last generated ID as USA2100000
  3. First client generates the new ID and saves it as USA2100001
  4. Second client generates the new ID and saves it as USA2100001

Since 2 clients have generated IDs, finally the DB should have had USA2100002.

To overcome this, I am using MongoDB transactions. My code in Typescript using Mongoose as ODM is something like this:

      session = await startSession();
      session.startTransaction();
      lastId = await GeneratedId.findOne({ key: idKeyStr }, "value").value
      lastId = createNextId(lastId); 
      const newIdObj: any = {
        key: `Type:${idPrefix}_Zone:${zone_letter}`,
        value: lastId,
      };
      await GeneratedId.findOneAndUpdate({ key: idKeyStr }, newIdObj, {
        upsert: true,
        new: true,
      });
      await session.commitTransaction();
      session.endSession();
  1. I want to know what exactly will happen when the situation I described above happens with this code?
  2. Will the second client's transaction throw an exception and I have to abort or retry the transaction in my code or will it handle the retry on its own?
  3. How does MongoDB or other DBs handle transactions? Does MongoDB lock the documents involved in the transaction? Are the exclusive locks (wont even allow other clients to read)?
  4. If the same client keeps failing to commit its transaction, this client would be starved. How to deal with this starvation?

Upvotes: 1

Views: 756

Answers (2)

prasad_
prasad_

Reputation: 14287

This is what you can try. You need to store only one document in the GeneratedId collection. This document will have the last generated id's value. The document must have a known _id field, for example lets say it will be an integer with value 1. So, the document can be like this:

{ _id: 1, lastGeneratedId: "<some value>" }

In your application, you can use the findOneAndUpdate() method with a filter { _id: 1 }; which means you are targeting one document update. This update will be an atomic operation; as per the MongoDB documentation "All write operations in MongoDB are atomic on the level of a single document." . Do you need a transaction in this case? No. The update operation is atomic and performs better than using a transaction. See Update Documents - Atomicity.

Then, how do I generate the new generated id and retrieve it?

I will receive ID TYPE and ZONE...

Using the above input values and the existing lastGeneratedId value you can arrive at the new value and update the document (with the new value). The new value can be calculated / formatted within the Aggregation Pipeline of the update operation - you can use the feature Updates with Aggregation Pipeline (this is available with MongoDB v4.2 or higher).

Note the findOneAndUpdate() method returns the updated (or modified) document when you use the update option new: true. This returned document will have the newly generated lastGeneratedId value.

The update method can look like this (using NodeJS driver or even Mongoose):

const filter = { _id: 1 }
const update = [
    { $set: { lastGeneratedId: { // your calculation of new value goes here... } } }
]
const options = { new: true, projection: { _id: 0, lastGeneratedId: 1} }

const newId = await GeneratedId.findOneAndUpdate(filter,  update, options).['lastGeneratedId']

Note about the JavaScript function:

With MongoDB v4.4 you can use JavaScript functions within an Aggregation Pipeline; and this is applicable for the Updates with Aggregation Pipeline. For details see $function aggregation pipeline operator.

Upvotes: 1

Alex Blex
Alex Blex

Reputation: 37048

You are using MongoDB to store the ID. It's a state. Generation of the ID is a function. You use Mongodb to generate the ID when mongodb process takes arguments of the function and returns the generated ID. It's not what you are doing. You are using nodejs to generate the ID.

Number of threads, or rather event loops is critical as it defines the architecture but in either way you don't need transactions. Transactions in mongodb are being called "multi-document transactions" exactly to highlight they are intended for consistent update of several documents at once. The very first paragraph of https://docs.mongodb.com/manual/core/transactions/ warns you that if you update a single document there is no room for transactions.

A single threaded application does not require any synchronisation. You can reliably read the latest generated ID on start and guarantee the ID is unique within the nodejs process. If you exclude mongodb and other I/O from the generation function you will make it synchronous so you can maintain state of the ID within nodejs process and guarantee its uniqueness. Once generated you can persist in in the db asynchronously. In the worst case scenario you may have a gap in the sequential numbers but no duplicates.

enter image description here

If there is a slighteest chance that you may need to scale up to more than 1 nodejs process to handle more simultaneous requests or add another host for redundancy in the future you will need to sync generation of the ID and you can employ Mongodb unique indexes for that. The function itself doesn't change much you still generate the ID as in a single-threaded architecture but add an extra step to save the ID to mongo. The document should have unique index on the ID field, so in case of concurrent updates one of the query will successfully add the document and another will fail with "E11000 duplicate key error". You catch such errors on nodejs side and repeat the function again picking the next number:

enter image description here

Upvotes: 2

Related Questions