mongodb - Recommended tree structure for large amount of data points

Question

I'm working on a project which records price history for items across multiple territories, and I'm planning on storing the data in a mongodb collection.

As I'm relatively new to mongodb, I'm curious about what might be a recommended document structure for quite a large amount of data. Here's the situation:

I'm recording the price history for about 90,000 items across 200 or so territories. I'm looking to record the price of each item every hour, and give a 2 week history for any given item. That comes out to around (90000*200*24*14) ~= 6 billion data points, or approximately 67200 per item. A cleanup query will be run once a day to remove records older than 14 days (more specifically, archive it to a gzipped json/text file).

In terms of data that I will be getting out of this, I'm mainly interested in two things: 1) The price history for a specific item in a specific territory, and 2) the price history for a specific item across ALL territories.

Before I actually start importing this data and running benchmarks, I'm hoping someone might be able to give some advice on how I should structure this to allow for quick access to the data through a query.

I'm considering the following structure:

{
    _id: 1234,
    data: [
        {
            territory: "A",
            price: 5678,
            time: 123456789
        },
        {
            territory: "B",
            price: 9876
            time: 123456789
        }
    ]
}

Each item is its own document, which each territory/price point for that item in a particular territory. The issue I run into with this is retrieving the price history for a particular item. I believe I can accomplish this with the following query:

db.collection.aggregate(
    {$unwind: "$data"},
    {$match: {_id: 1234, "data.territory": "B"}}

)

The other alternative I was considering was just put every single data point in its own document and putting an index on the item and territory.

// Document 1
{
    item: 1234,
    territory: "A",
    price: 5679,
    time: 123456789
}
// Document 2
{
    item: 1234,
    territory: "B",
    price: 9676,
    time: 123456789
 }

I'm just unsure of whether having 6 billion documents with 3 indexes or having 90,000 documents with 67200 array objects each and using an aggregate would be better for performance.

Or perhaps there's some other tree structure or handling of this problem that you fine folks and MongoDB wizards can recommend?

mongodb - Recommended tree structure for large amount of data points

Answers (1)

Related Questions