peterkin
peterkin

Reputation: 37

Mongo DB aggregate with embedded documents

I have a product collection like this, simplified:

[
  {
    "_id": 1,
    "ref": "product 1",
    "variants": [
      {
        "ref": "variant 1.1",
        "categories": ["category a"]
      },
      {
        "ref": "variant 1.1",
        "categories": ["category a","category b"]
      }
    ]
  },
  {
    "_id": 2,
    "ref": "product 2",
    "variants": [
      {
        "ref": "variant 2.1",
        "categories": ["category c"]
      },
      {
        "ref": "variant 2.1",
        "categories": ["category a","category c"]
      }
    ]
  }
]

I want to query for categories (distinct), with their number of containing products (not variants).

For example some result like this:

[
  "category a": 2,
  "category b": 1,
  "category c": 1
]

I tried some queries with aggregate and unwind, but I cannot figure it out. All help appreciated!

This is what I have so far:

[
  {$match: ... }, // optional filtering
  {$unwind: '$variants'},
  {$unwind: '$variants.categories'},
]

But cannot figure out now, howto group by category, with a aggregate count of all products (not variants) inside that category.

Upvotes: 2

Views: 886

Answers (1)

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236188

db.products.aggregate([
    {$unwind: "$variants"},
    {$unwind: "$variants.categories"},
    {$group: {_id:"$_id", categories: {$addToSet:"$variants.categories"}}},
    {$unwind: "$categories"},
    {$group: {_id: "$categories", count: {$sum:1}}}
])

Output:

{ "_id" : "category b",  "count" : 1 }
{ "_id" : "category c",  "count" : 1 }
{ "_id" : "category a",  "count" : 2 }

Explanation. First two unwind operators will bring categories out of nested arrays and you will have documents like this

{
    "_id" : 1,
    "ref" : "product 1",
    "variants" : {
        "ref" : "variant 1.1",
        "categories" : "category a"
    }
},
{
    "_id" : 1,
    "ref" : "product 1",
    "variants" : {
        "ref" : "variant 1.1",
        "categories" : "category a"
    }
},
{
    "_id" : 1,
    "ref" : "product 1",
    "variants" : {
        "ref" : "variant 1.1",
        "categories" : "category b"
    }
},
...

Next I do grouping to eliminate duplicated categories from each product variants. Result:

{
    "_id" : 1,
    "categories" : [ 
        "category b", 
        "category a"
    ]
},
...

One more unwind to get rid of categories array.

{
    "_id" : 1,
    "categories" : "category b"
},
{
    "_id" : 1,
    "categories" : "category a"
},
{
    "_id" : 2,
    "categories" : "category a"
},
{
    "_id" : 2,
    "categories" : "category c"
}

And then grouping to calculate count of distinct categories in each product. You will have output as specified above.

Upvotes: 2

Related Questions