Chris Wilson
Chris Wilson

Reputation: 6719

Index multiple MongoDB fields, make only one unique

I've got a MongoDB database of metadata for about 300,000 photos. Each has a native unique ID that needs to be unique to protect against duplication insertions. It also has a time stamp.

I frequently need to run aggregate queries to see how many photos I have for each day, so I also have a date field in the format YYYY-MM-DD. This is obviously not unique.

Right now I only have an index on the id property, like so (using the Node driver):

collection.ensureIndex(
   { id:1 }, 
   { unique:true, dropDups: true }, 
   function(err, indexName) { /* etc etc */ }
);

The group query for getting the photos by date takes quite a long time, as one can imagine:

collection.group(
        { date: 1 },
        {},
        { count: 0 },
        function ( curr, result ) { 
            result.count++;
        },
        function(err, grouped) { /* etc etc */ }
);

I've read through the indexing strategy, and I think I need to also index the date property. But I don't want to make it unique, of course (though I suppose it's fine to make it unique in combine with the unique id). Should I do a regular compound index, or can I chain the .ensureIndex() function and only specify uniqueness for the id field?

Upvotes: 1

Views: 1342

Answers (2)

Dharshan
Dharshan

Reputation: 404

Another option is to pre aggregate in the schema itself. Whenever you insert a photo you can increment this counter. This way you don't need to run any aggregation jobs. You can also run some tests to determine if this approach is more performant than aggregation.

Upvotes: 0

yǝsʞǝla
yǝsʞǝla

Reputation: 16412

MongoDB does not have "mixed" type indexes which can be partially unique. On the other hand why don't you use _id instead of your id field if possible. It's already indexed and unique by definition so it will prevent you from inserting duplicates.

Mongo can only use a single index in a query clause - important to consider when creating indexes. For this particular query and requirements I would suggest to have a separate unique index on id field which you would get if you use _id. Additionally, you can create a non-unique index on date field only. If you run query like this:

db.collection.find({"date": "01/02/2013"}).count();

Mongo will be able to use index only to answer the query (covered index query) which is the best performance you can get.

Note that Mongo won't be able to use compound index on (id, date) if you are searching by date only. You query has to match index prefix first, i.e. if you search by id then (id, date) index can be used.

Upvotes: 2

Related Questions