Mongodb an object representing an expression must have exactly one field when using $cond, $size and $in in aggregation pipeline

Question

I'm trying to work out for each country how many times the UK appears with the following query:

    db.movies.aggregate([
    {$unwind:"$countries"},
    {$group:{_id:"$countries", total : { $sum: {$cond: [ {
    "countries": {"$in" : ["UK", ["$countries"]], "$not" : { "$size" : 1.0}}}, 1, 0]}}}},
    {$match: { total: { $gte: 10 }}}
    ])

which gives me an object representing an expression must have exactly one field: { $in: [ "UK", [ "$countries" ] ], $not: { $size: 1.0 } }", although when I try the part giving me problems just by itself with a find it works fine:

db.movies.find(
{ "countries" : { "$in" : ["UK", ["$countries"]], "$not" : { "$size" : 1}}}, 
{ "_id" : 0, "countries" : 1});

result:

{ "countries" : [ "UK", "New Zealand" ] }
{ "countries" : [ "USA", "UK" ] }
{ "countries" : [ "UK", "USA", "France" ] }
{ "countries" : [ "UK", "USA" ] }
{ "countries" : [ "USA", "UK" ] }
{ "countries" : [ "UK", "Italy" ] }...

Joe · Accepted Answer

One of the most confusing aspects of MongoDB is that it contains 2 contexts, the Mongo Query Language, and the Aggregation Framework. There are operators that exist in both, but have very different syntax and action in each context.

The MQL is used in find filter expressions, and in $match expressions in an aggregation pipeline.

Aggregation expressions are used throughout the rest of the pipeline, and may be used in find or $match as part of an $expr expression.

Confused yet?

Good, so was I when I first saw those.

I see 3 problems with the query provided:

An object is passed to $cond instead of a boolean expression
MQL expressions being used where an aggregation expression is needed
Applying array operations to a string value.

1. $cond

From the documentation, $cond has the following form:

{ $cond: [ , ,  ] }

In your exmaple, the is

{"countries": {"$in" : ["UK", ["$countries"]], "$not" : { "$size" : 1.0}}}

That is a valid query object for the Mongo Query Language, but it is not a boolean expression. I'll come back to that in moment.

2a. $size

$size as an MQL operator matches any array with the number of elements specified by the argument. The usage in your query is consistent with using $size as this operator.

However, in a $cond expression inside of a $group stage, the aggregation form of the $size operator should be used. In this context $size counts and returns the total number of items in an array. i.e. not a boolean expression.

Inside of the $cond expression, to test if the size of the countries array is not equal to 1, use:

{$ne: [{$size:"$countries"},1]}

2b. $in

When using $in as an MQL operator, the syntax is:

{ field: { $in: [, , ...  ] } }

When using $in as an aggregation operator, the syntax is:

{ $in: [ ,  ] }

In this query, to test if the countries array contains "UK", use:

{$in: ["UK", "$countries"]}

$cond + $size + $in + $and

To combine these all together, use the $and operator:

{$cond: [
    {$and: [
        {$in: ["UK", "$countries"]},
        {$ne: [{$size:"$countries"},1]}
    ]},
    1,
    0
]}

3. Array operators

The $unwind stage will explode the array so that each document passed to the following pipeline stages contains only one element from the original array. Thus the field countries in the documents passed to the $group stage will be a single string value, not an array.

To test this, run the aggregation with just the $unwind stage, and examine the results.

EDIT

Based on the statements in the comments, the goal is to count the occurrences of each other country that appears in an array that also contains "UK". To accomplish this, use the following aggregation stages:

$match to select only documents that contain "UK" in the array
$unwind the array so each element can be considered separately
$match to eliminate "UK" documents from the steam
$group by country and count the number of occurrences

db.movies.aggregate([
  {$match: {countries: "UK"}},
  {$unwind: "$countries"},
  {$match: {countries: {$ne: "UK"}}},
  {$group: {
      _id: "$countries",
      count: {$sum: 1}
  }}
])

Playground