K20GH
K20GH

Reputation: 6281

Query against nested object keys

What I am trying to do, is have my results list out where the history contains certain dates.

Below is an example of one of my documents in mongo. The history object below stores the date as the key and the 'number' as the key.

What I need to do is perform a query which will return all documents where the history keys (date) are within a certain range.

For example, the below document would be returned if the start date was 1505435121000 and the end date was 1505860712000. If the start date was 1451606400 and then end date was 1481906300 then the below document would not be returned

{
    "sold": 24,
    "index": "5",
    "searchRange": 1,
    "history": {
        "1505860712000": 103079,
        "1505773195000": 157659,
        "1505694076000": 92157,
        "1505609622000": 47861,
        "1505516353000": 78869,
        "1505435121000": 158278,
        "1505343796000": 229944
    },
    "createdAt": {
        "$date": "2017-09-20T17:18:49.665Z"
    },
    "updatedAt": {
        "$date": "2017-10-20T08:02:47.094Z"
    },
}

What I am doing at the moment is pulling all documents, and then filtering them. However, with a growing 10k+ documents, this is taking an extremely long time and is becoming very CPU intensive and inefficient

Upvotes: 1

Views: 312

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151230

MongoDB does not really deal with traversing Object keys well, so it would be better to have the data in an "array" form which it deals with more naturally

If you have MongoDB 3.4.4 or above then you can apply $objectToArray in the tranformation to enable the condition:

Model.native((err,collection) => {

  collection.aggregate([
    { "$redact": {
      "$cond": {
        "if": {
          "$gt": [
            { "$size": {
              "$filter": {
                "input": { "$objectToArray": "$history" },
                "as": "h",
                "cond": {
                  "$and": [
                    { "$gte": [ "$$h.k", startDate.toString() ] },
                    { "$lte": [ "$$h.k", endDate.toString() ] }
                  ]
                }
              }
            }},
            0
          ]
        },
        "then": "$$KEEP",
        "else": "$$PRUNE"
      }
    }}
  ])
  .toArray((err,results) => {
    // do something with results
  })
})

In short $objectToArray transforms the "object" into an "array" ( just in case the naming is not clear ) which has properties of "k" for the "key" and "v" for the "value" from each of the object entries. This "array" is then fed to $filter which applies conditions to see if each entry falls within the supplied criteria or not. If it does, then it's returned and if not then it's stripped from the returned array.

The eventual $size of the array tells you if any elements met the conditions, and this logical condition is expressed with $cond to determine whether to $$KEEP the document or $$PRUNE it from results in the $redact pipeline.

If you don't have that version and operator available, the you need to process the query with the JavaScript evaluation of $where:

Model.native((err,collection) => {

  collection.find({ "$where": function() {
    var startDate =  1505435121000,
        endDate = 1505860712000;

    return Object.keys(this.history).some( k =>
      k >= startDate.toString() && k <= endDate.toString()
    )
  }})
  .toArray((err,results) => {
    // do something with results
  })
})

Note there that you need the variables inside the function to be provided in context because that is how the expression is going to be sent to the server. So what you probably want to do is create a function which will take those as parameters and return a function which can actually be fed to the $where clause as an argument.

You might even find it an easier concept that the way that is transported to the server is actually as a "string", so you can in fact construct that whole JavaScript expression as a string if you want. As long as it evaluates on the server without error then it's fine.

It still has the same concept though as Object.keys extracts the "keys" from the object and Array.some() returns true if "any" of those "keys" falls within the range of those conditions.


Changing the data

As noted earlier, MongoDB does not natively work well with object keys. Hence it's generally better to present this sort of data in an array:

'history': [
  { "time": 150586071200, "value": 103079 },
  { "time": 1505773195000, "value": 157659 }
]

If you actually had that, then your query is actually really simple. Just writing the "native" mongodb part at this stage:

var startDate =  1505435121000,
        endDate = 1505860712000;

collection.find({ 
  "history": {
    "$elemMatch": {
      "time": { "$gte": startDate, "$lte": endDate }
    }
  }
})

And that's actually it. What's more is that you can actually "index" the data so an index can be used for selection. In neither of the other above examples is that possible since we are dealing with "keys" and not "values" as well as essentially needing to "compute" for each document.

So even if you say you cannot do this right now, there really is a compelling reason to change the structure. And note that the more documents in the collection, the worse the performance for a "computational scan" will actually get.

Upvotes: 2

Related Questions