azpublic
azpublic

Reputation: 1404

MongoDB : how to index the keys of a Map

In Java I have an object that looks like this :

class MyDoc {
     ObjectId docId;
     Map<String, String> someProps = new HashMap<String,String>(); 
}

which, when persisted to MongoDB produces the following document :

{
    "_id" : ObjectId("4fb538eb5e9e7b17b211d5d3"),
    "someProps" : {
        "4fda4993eb14ea4a4a149c04" : "PROCESSED",
        "4f56a5c4b6f621f092b00525" : "PROCESSED",
        "4fd95a2a0baaefd1837fe504" : "TODO"
    }
}

I need to query as follow.

DBObject queryObj =  
new BasicDBObject("someProps.4fda4993eb14ea4a4a149c04","PROCESSED");                        
DBObject explain =  
getCollection().find(queryObj).hint("props_indx").explain();

which should read find me the MyDoc documents that have a someProps with key "4fda4993eb14ea4a4a149c04" and value "Processed"

I have millions of MyDoc documents stored in the collection so I need efficient indexing on the keys of the someProps embedded object.

The keys of the map are not known in advance (they are dynamically generated, they are not a fixed set of keys) so I cannot create one index per someProps key. (at least I don't think I can correct me if i'm wrong)

I tried to create the index directly on someProps but querying took ages.

How can Index on someProps Map keys ? Do I need a different document structure ?

Improtant notes :

1 . There can only be ONE element of someProps with the same key. for example :

{
"_id" : ObjectId("4fb538eb5e9e7b17b211d5d3"),
    "someProps" : {
        "4fda4993eb14ea4a4a149c04" : "PROCESSED",
        "4f56a5c4b6f621f092b00525" : "PROCESSED",
        "4f56a5c4b6f621f092b00525" : "TODO"
    }
}

would be invalid because 4f56a5c4b6f621f092b00525 cannot be found two times in the Map (hence the use of a Map in the first place)

2 . I also need to efficiently update someProps, only changing the value (ex: changing "4fda4993eb14ea4a4a149c04" : "PROCESSED", to "4fda4993eb14ea4a4a149c04" : "CANCELLED" )

What are my options ?

Thanks.

Upvotes: 13

Views: 10451

Answers (3)

user150340
user150340

Reputation:

If you want to keep your properties embedded, you could also use the dynamic attributes pattern as proposed by Kyle Banke in "MongoDB in Action". So instead of putting the props in their own collection, you modify your mydocs collection to look like this:

{
  "_id" : ObjectId("4fb538eb5e9e7b17b211d5d3"),
  "someProps" : [
      { k: "4fda4993eb14ea4a4a149c04", v: "PROCESSED" },
      { k: "4f56a5c4b6f621f092b00525", v: "PROCESSED" },
      { k: "4fd95a2a0baaefd1837fe504", v : "TODO" }
  ]
}

And then index on the embedded document keys:

db.mydoc.ensureIndex({'someProps.k' :1}, {'someProps.v' :1})

This is very close to what Sergio suggested, but your data will still be one document in a single collection.

Upvotes: 12

Sergio Tulentsev
Sergio Tulentsev

Reputation: 230286

I suggest expanding these properties to a documents of their own. So your example:

{
    "_id" : ObjectId("4fb538eb5e9e7b17b211d5d3"),
    "someProps" : {
        "4fda4993eb14ea4a4a149c04" : "PROCESSED",
        "4f56a5c4b6f621f092b00525" : "PROCESSED",
        "4fd95a2a0baaefd1837fe504" : "TODO"
    }
}

becomes this

{_id: {id1: ObjectId("4fb538eb5e9e7b17b211d5d3"), id2: "4fda4993eb14ea4a4a149c04"}, v: "PROCESSED"}
{_id: {id1: ObjectId("4fb538eb5e9e7b17b211d5d3"), id2: "4f56a5c4b6f621f092b00525"}, v: "PROCESSED"}
{_id: {id1: ObjectId("4fb538eb5e9e7b17b211d5d3"), id2: "4fd95a2a0baaefd1837fe504"}, v: "TODO"}

Here id1 is id of your former parent entity (be it application or whatever) and id2 is property id.

Uniqueness is enforced by properties of _id field. Atomic updates are trivial. Indexing is easy

db.props.ensureIndex({'_id.id2': 1})

The only disadvantage is some storage overhead.

Upvotes: 2

Asya Kamsky
Asya Kamsky

Reputation: 42342

What about structuring your document like this:

{
"_id" : ObjectId("4fb538eb5e9e7b17b211d5d3"),
    "someProps" : {
        "PROCESSED":["4fda4993eb14ea4a4a149c04","4f56a5c4b6f621f092b00525"],
        "TODO" : ["4f56a5c4b6f621f092b00526"],
        "CANCELLED" : [ ]
    }
}

The three advantages of this are:

  1. You can see if some object is processed by flipping your query from "someProps.4fda4993eb14ea4a4a149c04","PROCESSED" to "someProps.PROCESSED", "4fda4993eb14ea4a4a149c04"

  2. you can create an index on "someProps.TODO" and another one on "someProps.PROCESSED" (you can't create a compound index on several parallel arrays but it sounds like you'd be querying by a single status, right?

  3. you can atomically move a document from one state to another, like this:

.

db.collection.update({"someProps.PROCESSED": "4fda4993eb14ea4a4a149c04"},
                     {$pull:{"someProps.PROCESSED":"4fda4993eb14ea4a4a149c04"},
                      $push:{"someProps.CANCELLED":"4fda4993eb14ea4a4a149c04"}});

Upvotes: 5

Related Questions