enginedave
enginedave

Reputation: 43

Embed or reference in Mongodb

I am developing a small app which will store information on users, accounts and transactions. The users will have many accounts (probably less than 10) and the accounts will have many transactions (perhaps 1000's). Reading the Docs it seems to suggest that embedding as follows is the way to go...

{
"username": "joe",
"accounts": [
    {
        "name": "account1",
        "transactions": [
            {
                "date": "2013-08-06",
                "desc": "transaction1",
                "amount": "123.45"
            },
            {
                "date": "2013-08-07",
                "desc": "transaction2",
                "amount": "123.45"
            },
            {
                "date": "2013-08-08",
                "desc": "transaction3",
                "amount": "123.45"
            }
        ]
    },
    {
        "name": "account2",
        "transactions": [
            {
                "date": "2013-08-06",
                "desc": "transaction1",
                "amount": "123.45"
            },
            {
                "date": "2013-08-07",
                "desc": "transaction2",
                "amount": "123.45"
            },
            {
                "date": "2013-08-08",
                "desc": "transaction3",
                "amount": "123.45"
            }
        ]
    }
 ]
}

My question is... Since the list of transactions will grow to perhaps 1000's within the document will the data become fragmented and slow the performance. Would I be better to have a document to store the users and the accounts which will not grow as big and then a separate collection to store transactions which are referenced to the accounts. Or is there a better way?

Upvotes: 3

Views: 1275

Answers (4)

Derick
Derick

Reputation: 36784

This is not the way to go. You have a lot of transactions, and you don't know how many you will get. Instead of this, you should store them like:

{
    "username": "joe",
    "name": "account1",
    "date": "2013-08-06",
    "desc": "transaction1",
    "amount": "123.45"
},
{
    "username": "joe",
    "name": "account1",
    "date": "2013-08-07",
    "desc": "transaction2",
    "amount": "123.45"
},
{
    "username": "joe",
    "name": "account1",
    "date": "2013-08-08",
    "desc": "transaction3",
    "amount": "123.45"
},
{
    "username": "joe",
    "name": "account2",
    "date": "2013-08-06",
    "desc": "transaction1",
    "amount": "123.45"
},
{
    "username": "joe",
    "name": "account2",
    "date": "2013-08-07",
    "desc": "transaction2",
    "amount": "123.45"
},
{
    "username": "joe",
    "name": "account2",
    "date": "2013-08-08",
    "desc": "transaction3",
    "amount": "123.45"
}

In a NoSQL database like MongoDB you shouldn't be afraid to denormalise. As you noticed, I haven't even bothered with a separate collection for users. If your users have more information that you will have to show with each transaction, you might want to consider including that information as well.

If you need to search on, or select by, any of those fields, then don't forget to create indexes, for example:

// look up all transactions for an account
db.transactions.ensureIndex( { username: 1, name: 1 } ); 

and:

// look up all transactions for "2013-08-06"
db.transactions.ensureIndex( { date: 1 } ); 

etc.

There are a lot of advantages to duplicate data. With a schema like above, you can have as many transactions as possible and you will never get any fragmentation as documents never change - you only add to them. This also increases write performance and also makes it a lot easier to do other queries.

Alternative

An alternative might be to store username/name in a collection and only use it's ID with the transactions:

Accounts:

{
    "username": "joe",
    "name": "account1",
    "account_id": 42,
}

Transactions:

{
    "account_id": 42,
    "date": "2013-08-06",
    "desc": "transaction1",
    "amount": "123.45"
},

This creates smaller transaction documents, but it does mean you have to do two queries to also get user information.

Upvotes: 6

Rotem Hermon
Rotem Hermon

Reputation: 2147

I would separate the transactions to a different collections. Seems like the data and update patterns between users and transactions are quite different. If transactions are constantly added to the user and causes it to grow all the time it will be moved a lot in the mongo file. So yes, it brings performance impact (fragmentation, more IO, more work for mongo). Also, array operation performance sometimes desegregates on big arrays in documents, so holding 1000s of object in an array might not be a good idea (depends on what you do with it).

Upvotes: 1

Sammaye
Sammaye

Reputation: 43884

Since the list of transactions will grow to perhaps 1000's within the document will the data become fragmented and slow the performance.

Almost certainly, infact I would be surprised if over a period of years transactions only reached into the thousands instead of 10's of thousand for a single account.

Added the level of fragmentation you will witness from the consistently growing document over time you could end up with serious problems, if not running out of root document space (with it being 16meg). In fact looking at the fact that you store all accounts for a person under one document I would say you run a high risk of filling up a document in the space of about 2 years.

I would reference this relationship.

Upvotes: 1

seridis
seridis

Reputation: 5

You should consider creating indexes, using the ensureIndex() function, it should reduce the risk of performance issues. The earlier you add these, the better you'll understand how the collection should be structured. I haven't been using mongo too long but I haven't come across any issues(not yet anyway) of data being fragmented

Edit If you intend to use this for multi-object commits, mongo doesn't support rollbacks. You need to use the 64bit version to allow journaling and make transactions durable.

Upvotes: 0

Related Questions