Nick Long
Nick Long

Reputation: 1958

MongoDB Is searching for "null" faster than searching for "does not exist"?

My understanding is querying for null finds fields that are null as well as fields that don't exist.

The mongo manual also states that "$exists is not very efficient even with an index"

Should querying for null also be considered inefficient?

If you know that the field you are looking for exists but is null is it more efficient to write:

db.foo.find( { "y" : { $type : 10 } } )

than:

db.foo.find( { "y" : null } )

Assume the field is indexed

EDIT: To give some context of the way this is being used so people can suggest a better approach: I'm tracking processing steps that have happened on the documents. I have a timestamp for each step so I know when it occurred. Periodically a housekeeping process runs and checks for any tasks that haven't occurred (i.e. no timestamp exists for completion) to make sure nothing is missed

Upvotes: 3

Views: 1091

Answers (2)

Adam Comerford
Adam Comerford

Reputation: 21692

I did a quick test here and ran a couple of explains to see what each query looked like, first the one with the type specified:

PRIMARY> db.nulltest.find( { "a" : { $type : 10 } } ).explain()
{
    "cursor" : "BtreeCursor a_1",
    "nscanned" : 110011,
    "nscannedObjects" : 110011,
    "n" : 110011,
    "millis" : 121,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {
        "a" : [
            [
                null,
                null
            ]
        ]
    }
}

Then the one with the null as the criteria:

PRIMARY> db.nulltest.find( { "a" : null  } ).explain()
{
    "cursor" : "BtreeCursor a_1",
    "nscanned" : 110011,
    "nscannedObjects" : 110011,
    "n" : 110011,
    "millis" : 122,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {
        "a" : [
            [
                null,
                null
            ]
        ]
    }
}

The results look pretty much identical (1ms is not significant) and the index bounds are identical on the explain. Hence I suspect there is no difference here at all, so sticking with the more readable syntax is a good idea

Upvotes: 1

mnemosyn
mnemosyn

Reputation: 46341

I can't, with certainty, say which one is faster, but there's a thread in the news group where someone has problems with the performance of the $type-query.

Setting that aside, I wouldn't use the $type query for three reasons:

  1. it's harder to read and understand. Even a regular MongoDB user probably doesn't know what {type : 10} is off the top of his head.
  2. As you already pointed out, the semantics are different. The $type query allows 'two different kinds of null', i.e. a document where the field does not exist, and a document where the field exists but is null. Upon deserialization, this will probably turn into identical objects in most languages, but the $type-query will distinguish them.
  3. Indexes are made to index data, not meta information. Even if the query optimizer is able to map it, I'd still stick to the more straightforward solution.

I believe another performance risk might be the potentially low selectivity of the null value: If half of the documents have some value y and half of the documents have null y, querying for the null-elements will return a lot of documents, and iterating cursors can be expensive. That is only true if there are a lot of null values though.

Upvotes: 2

Related Questions