Sagar
Sagar

Reputation: 5596

MongoDB linking and atomic operations

suppose there are two separate collections: Cities{name: "NYC", area: "1223", people: [1,2,3]} and Person {personId:1, name:"abc"}

As you can see we have linked person's Id (kind of foreign key in RDBMS)in cities.

Now if I decide to delete the person with Id =1 then I want to update people array in Cities, as it happens with cascade delete operation in RDBMS, I know we don't have cascade delete kind of operation as per this, and as we don't have transactions in MOngoDB there are chances of database being in inconsistent state if we somehow fail to perform "Cascade delete" kind of operation on cities. How can make sure whole "delete person +cascade delete on cities" as an atomic operation in MongoDB? If can't then shall we prefer embedding always over linking?

Upvotes: 2

Views: 1210

Answers (3)

Sammaye
Sammaye

Reputation: 43884

How can make sure whole "delete person +cascade delete on cities" as an atomic operation in MongoDB?

As others have said MongoDB has a client side version of two phase commit, however this:

  • Does not provide the two phase transactional commit
  • Is not atomic
  • Has considerable overhead
  • Is client side (which actually rebuffs any benefit you get from this being server-side, i.e. with fails)
  • And cannot provide saftey of information and transactions in the event of a failure

That being said if you can rely on your application being to write to the database and not failing then the MongoDB edition of a two phase commit could work here; however, then why are you not just doing one query after the other instead of adding the extra overhead of using fake two phase commits.

It is, normally, assumed that one can write to mongodb if one of the delete queries succeed however in the event they do not most "mark" the parent row of the cascade as deleted or something and has a dedicated cronjob which comes back later and cleans it all up in a manner that is consistent (since to do that there and then would delay the client).

As for which schema design is best, it is not true that embedding is preferred. I have noticed that you say:

then I want to update people array in Cities

Which would require most likely a $pull or something similar to be used on that array. I should note that if that array grows considerably that the in-memory operation of $pull will be somewhat slower than querying two separate collections.

At the end of the day, we cannot really advise on your schema design because we don't know enough so I will just leave it at that.

Edit

Though both of the other answers do make a point. If you embed the city_id into the person document you can actually cascade the relation in a single call. Of course this is a odd one off, normally you might have too many children records to fit into a Mongo doocument but this scenario fits.

Upvotes: 1

Alderis Shyti
Alderis Shyti

Reputation: 278

Embedding is preferable over linking both for speed of operation and consistency. What you are trying to achieve would be a transaction operation in mongodb, which is not atomic since it involves more than one document.

You could very easily though have the city referenced from each of the person documents rather than have each city reference each person. It is a more sound structure for this many-to-one relationship you are having.

If data consistency is important for your application for similar cases, you might want to consider using mongodb's two-phase commit pattern which emulates RDBMS transactions.

Upvotes: 3

br3w5
br3w5

Reputation: 4593

Your schema design should match your data access patterns. So it would be better to include a 'city:NYC' key:value pair in a people collection rather than the personID in the Cities collection because if you have an id for every person in NYC you would probably exceed the 16mb document size limit imposed on Mongodb documents i.e. because the array would consist of millions of elements.

This would make it easier to update a person's city (because it happens less often) rather than updating the number of people in a city which is changing all the time. Probably the safest way to manage multiple updates is through your application code.

Upvotes: 0

Related Questions