Reputation: 1161
I still getting used to using a schema-less document oriented database and I am wondering what a generally accepted practice is regarding schema designs within an application model.
Specifically I'm wondering whether it is a good practice to use enforce a schema within the application model when saving to mongodb like this:
{
_id: "foobar",
name: "John"
billing: {
address: "8237 Landeau Lane",
city: "Eden Prairie",
state: "MN",
postal: null
}
balance: null,
last_activity: null
}
versus only storing the fields that are used like this:
{
_id: "foobar",
name: "John"
billing: {
address: "8237 Landeau Lane",
city: "Eden Prairie",
state: "MN"
}
}
The former is self-descriptive which I like, while the latter makes no assumptions on the mutability of the model schema.
I like the first option because it makes it easy to see at a glance what fields are used by the model yet currently unspecified, but it seems like it would be a hassle to update every document to reflect a new schema design if I wanted to add an extra field, like favorite_color
.
How do most veteran mongodb users handle this?
Upvotes: 2
Views: 869
Reputation: 43884
I prefer the first option, it is easier to code within the application and requires much less state holders and functions to understand how things should work.
As for adding a new field over time you don't need to update all your records to support this new field like you would in SQL all you need to do is write the new field into your model application side and support this field being null
if it is not returned from MongoDB.
A good example is in PHP.
I have a class of user
at first with only one field, name
class User{
public $name;
}
6 months down the line and 60,000 users later I want to add, say, address
. All I do is add that variable to my application model:
class User{
public $name;
public $address = array();
}
This now works exactly like adding a new null
field to SQL without having to actually add it to every row on-demand.
It is a very reactive design, don't update what you don't need to. If that row gets used it will get updated, if not then who cares.
So eventually your rows actually become a mix and match between option 1 and 2 but it is really a reactive option 1.
On the storage side you have also got to think of pre-allocation and movement of documents.
Say the amount of a set record now is only a third of the doc but then suddenly, from the user updating the doc with all of the fields, you now have extra fragmentation from the movement of your docs.
Normally when you are defining a schema like this you are defining one that will eventually grow and apply to that user in most cases (much like an SQL schema does).
This is something to take into consideration that even though storage might be lower in the short term it could cause fragmentation and slow querying due to that fragmentation and you could easily find yourself having to run compact
s or repairDb
s due to the problems you now face.
I should mention that both of those functions I said above are not designed to be run regularly and have a significant performance problem to them while they run on a production environment.
So really with the structure above you don't need to add a new field across all documents and you will most likely get less movement and problems in the long run.
You can fix the performance problems of consistently growing documents by using power of 2 sizes padding, but then this is collection wide which means that even your fully filled documents will use up at least double their previous space and you small documents will probably be using as much space as your full documents would have on a padding factor of 1
.
Aka you lose space, not gain it.
Upvotes: 2
Reputation: 4159
I would suggest second approach.
In any case it all goes down your db size. If you don't target for many GBs or TBs of data, then both approaches are fine. But, if you predict, that your DB may grow really large, I would do anything to cut the size. Spending 30-40% of storage for column names is a bad idea.
Upvotes: 4