Reputation: 5596
suppose there are two separate collections: Cities{name: "NYC", area: "1223", people: [1,2,3]} and Person {personId:1, name:"abc"}
As you can see we have linked person's Id (kind of foreign key in RDBMS)in cities.
Now if I decide to delete the person with Id =1 then I want to update people array in Cities, as it happens with cascade delete operation in RDBMS, I know we don't have cascade delete kind of operation as per this, and as we don't have transactions in MOngoDB there are chances of database being in inconsistent state if we somehow fail to perform "Cascade delete" kind of operation on cities. How can make sure whole "delete person +cascade delete on cities" as an atomic operation in MongoDB? If can't then shall we prefer embedding always over linking?
Upvotes: 2
Views: 1210
Reputation: 43884
How can make sure whole "delete person +cascade delete on cities" as an atomic operation in MongoDB?
As others have said MongoDB has a client side version of two phase commit, however this:
That being said if you can rely on your application being to write to the database and not failing then the MongoDB edition of a two phase commit could work here; however, then why are you not just doing one query after the other instead of adding the extra overhead of using fake two phase commits.
It is, normally, assumed that one can write to mongodb if one of the delete queries succeed however in the event they do not most "mark" the parent row of the cascade as deleted or something and has a dedicated cronjob which comes back later and cleans it all up in a manner that is consistent (since to do that there and then would delay the client).
As for which schema design is best, it is not true that embedding is preferred. I have noticed that you say:
then I want to update people array in Cities
Which would require most likely a $pull
or something similar to be used on that array. I should note that if that array grows considerably that the in-memory operation of $pull
will be somewhat slower than querying two separate collections.
At the end of the day, we cannot really advise on your schema design because we don't know enough so I will just leave it at that.
Though both of the other answers do make a point. If you embed the city_id
into the person document you can actually cascade the relation in a single call. Of course this is a odd one off, normally you might have too many children records to fit into a Mongo doocument but this scenario fits.
Upvotes: 1
Reputation: 278
Embedding is preferable over linking both for speed of operation and consistency. What you are trying to achieve would be a transaction operation in mongodb, which is not atomic since it involves more than one document.
You could very easily though have the city referenced from each of the person documents rather than have each city reference each person. It is a more sound structure for this many-to-one relationship you are having.
If data consistency is important for your application for similar cases, you might want to consider using mongodb's two-phase commit pattern which emulates RDBMS transactions.
Upvotes: 3
Reputation: 4593
Your schema design should match your data access patterns. So it would be better to include a 'city:NYC' key:value pair in a people collection rather than the personID in the Cities collection because if you have an id for every person in NYC you would probably exceed the 16mb document size limit imposed on Mongodb documents i.e. because the array would consist of millions of elements.
This would make it easier to update a person's city (because it happens less often) rather than updating the number of people in a city which is changing all the time. Probably the safest way to manage multiple updates is through your application code.
Upvotes: 0