jbelis
jbelis

Reputation: 695

How do mongodb aggregation framework pipelines work?

I may have a fundamental misunderstanding of how the mongodb aggregation framework pipelines work. My expectation is that each step consumes the output of the preceding step. Here is a concrete example using the sample collection provided at http://media.mongodb.org/zips.json :

> db.zipcodes.aggregate({$match:{state:"CA"}});

produces results such as these.

    {
        "city" : "TRUCKEE",
        "loc" : [
            -120.295031,
            39.319321
        ],
        "pop" : 199,
        "state" : "CA",
        "_id" : "96162"
    }

so far so good. Then I decide to add another step to get a projection of the above, by running:

> db.zipcodes.aggregate({
    $match:{state:"CA"}, 
    $project: {city: 1, pop: 1, state: 1}
});

The projection works but ignores the first $match step. It is based on the original input, and includes documents in which state != CA:

    {
        "city" : "THAYNE",
        "pop" : 505,
        "state" : "WY",
        "_id" : "83127"
    }

Is my expectation misplaced or have I been staring at a syntax issue without seeing it? I am running version 2.2.0:

> db.version();
2.2.0

The exemple queries seem to work.

Thanks in advance.

Upvotes: 2

Views: 1698

Answers (1)

A. Jesse Jiryu Davis
A. Jesse Jiryu Davis

Reputation: 24007

It's a syntax issue, you're doing:

db.zips.aggregate({$match:{state:"CA"}, $project: {city: 1, pop: 1, state: 1}})

... with both fields, $match and $project, in the same document. The aggregate command takes a series of distinct documents describing the stages of the pipeline:

db.zips.aggregate({$match:{state:"CA"}}, {$project: {city: 1, pop: 1, state: 1}})

Apparently, if you put several fields in a pipeline stage, MongoDB only uses the last one. This is a bug that's been fixed on the next development branch: https://jira.mongodb.org/browse/SERVER-6861

Upvotes: 3

Related Questions