Reputation: 338
For instance, assume I have a MongoDB database that stores a number of schools, and a number of teachers, and students in those schools. Instead of having each school be its own collection in the database, I have a collection of Schools, Teachers, and Students, and obviously in the documents under Students and Teachers, I have some reference to the respective school under the Schools collection. However, is there a way to somehow logically/physically group the data such that Teacher, and Student documents, are grouped under their respective School documents.
As of now, I have three different collections, Schools, Teachers, Students, and lets say I want all students that attend StackOverflow Academy; I'd do something like:
Students.find({school: "stackOverFlowAcademy_ID"})
But as the database grows in size, I assume this way wouldn't be efficient and quick, compared to if it were a small database.
Is my current approach enough, or is there a more efficient way to do this.
EDIT:
MongoDB docs state that if you're using MongoDB Atlas (Which I am), sharding, and other effective "grouping" of data is handled automatically on their end; so no need to do any sharding, or replica sets implementation by yourself if you're using Atlas.
Upvotes: 0
Views: 131
Reputation: 17935
This is a wide topic, I'm putting few things what I'm aware of :
Replica sets : A replica set is a group of mongod instances that host the same data set, when you create mongoDB thru mongoDB Atlas what you'll get is a cluster with three nodes, which is nothing but three mongod instances, their primary purpose is high availability. As I said having replica set has much likely nothing to do with your data structure. Usually Replica sets will always have 1 Primary node and 2 Secondary(can serve read reqs) - if a Primary is down one of it's Secondary will become as Primary and serve requests until Primary is back on, Once it's back data will be synced (everything is taken care by mongoDB Atlas, usual median downtime will be 12sec).
Sharding : As far as I know when your database size is more than 2TB or 4TB(Please check on this) that's when you go to sharding which is a better option to do i.e; horizontal scaling rather than increasing RAM & size of your DB - We add more servers and in a word Sharding is nothing but a bunch of replica sets called shards plus config servers managed by mongos but in depth there is a lot to know before implementing it.
Going back, yes having a reference key between multiple collections is also an option, with introduction of aggregation particularly with $lookup
& $graphLookup
you can do most of your mappings. And remember to maintain good index keys for better querying. All in all it's more like you need to analyze your applications data prior to start. Try to use query analyzer (explain) in mongoDB to check stats about each query performance.
Example:-
As mongoDB is denormalized, you can definitely consider having embedded documents but you need to know when to have (Vs) when not to.
Let's say if you're dealing with a social media website have users collection where you will store a bunch of users with their related information(phone num ,height, dob, email) and can have embedded document of addresses(1 or 2) which usually won't change that often but list of friends has to be stored in different collection as it needs much maintenance plus can be accessed individually and more like you make your User JSON look better with less & important data. It's all about your data requirements(1-many or 1-n) and querying capabilities.
Check these links :
MongoDB courses are free & best to learn, which are directly offered by mongoDB University.
In Mongo what is the difference between sharding and replication?
Upvotes: 0