Modermo
Modermo

Reputation: 1992

Conditionally $project from different fields

I have the unfortunate problem of a data structure changing on me around six months ago. Therefore, I have a document that used to look like...

{
  fruits: [
    {
      id: 123
    },
    {
      id: 456
    }
  ]
}

(Important to note that the id is not the ObjectId BSON type, it's just a random series of characters generated by the client side).

...but now has the id key changed to.

{
  fruits: [
    {
      fruit_id: 'xxx'
    },
    {
      fruit_id: 'yyy'
    }
  ]
}

So, I'm trying to do a $project to have both id and fruit_id changed to something generic like general_id, so that I can continue with another aggregation like $group and just reference the one field

I've tried something along the lines of:

[
  $unwind: {
    path: '$fruits'
  },
  $project: {
    general_id: {
      $cond: {
        if: {
          'fruits.fruit_id': {
            $type: ['string']
          }
        },
        then: '$fruits.fruit_id',
        else: '$fruits.id'
      }
    }
  }
]

Upvotes: 1

Views: 441

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151220

It really depends on what you are after here, but for a general case of knowing two possibilities it's probably better to use $ifNull to return the value for the field if present, otherwise return the other field's value.

Adding a little more data for demonstration since you probably don't want to loose anything else in the array elements:

{
  _id: 1,
  fruits: [
    {
      id: 123,
      data: 1
    },
    {
      id: 456,
      data: 2
    }
  ]
},
{
  _id: 2,
  fruits: [
    {
      fruit_id: 'xxx',
      data: 1
    },
    {
      fruit_id: 'yyy',
      data: 2
    }
  ]
},
{
  _id: 3,
  fruits: [
    {
      fruit_id: 'xxx',
      data: 1,
    },
    {
      fruit_id: 'yyy',
      data: 2
    },
    {
      id: 123,
      data: 3
    },
    {
      id: 456,
      data: 4
    }
  ]
}

Then you can either do the process using $unwind as the first step, which does make path naming easier and especially with $addFields instead of $project:

Model.aggregate([
  { "$unwind": "$fruits" },
  { "$addFields": {
    "fruits": {
      "id": "$$REMOVE",
      "fruit_id": "$$REMOVE",
      "general_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] }
    }
  }}
])

That uses $$REMOVE from MongoDB 3.6 and above ( which should be the minimal version you are using ) in order to "remove" the fields you don't want. You don't need to do that and can just declare everything you actually want with $project if you don't have support.

Then of course has the alternate with an $ifNull expression.

This gives results on that data like:

{ "_id" : 1, "fruits" : { "data" : 1, "general_id" : 123 } }
{ "_id" : 1, "fruits" : { "data" : 2, "general_id" : 456 } }
{ "_id" : 2, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 2, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 3, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 3, "general_id" : 123 } }
{ "_id" : 3, "fruits" : { "data" : 4, "general_id" : 456 } }

If you wanted to $group on that value, then there's no need for an intermediate "project" of any kind. Just do the $ifNull directly in that stage:

Model.aggregate([
  { "$unwind": "$fruits" },
  { "$group": {
    "_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] },
    "count": { "$sum": 1 }
  }}
])

And output:

{ "_id" : "yyy", "count" : 2 }
{ "_id" : "xxx", "count" : 2 }
{ "_id" : 456, "count" : 2 }
{ "_id" : 123, "count" : 2 }

Or if you actually did not need to $unwind the array for other purposes, you can use $map and some other manipulation with $objectToArray and $arrayToObject:

Model.aggregate([
  { "$addFields": {
    "fruits": {
      "$map": {
        "input": "$fruits",
        "in": {
          "$mergeObjects": [
            { "$arrayToObject": {
              "$filter": {
                "input": { "$objectToArray": "$$this" },
                "cond": { "$not": { "$in": [ "$$this.k", ["fruit_id","id"] ] } }
              }
            }},
            {
              "general_id": { "$ifNull": ["$$this.id","$$this.fruit_id"] }
            }
          ]
        }
      }
    }
  }}
])

Which returns results like:

{
        "_id" : 1,
        "fruits" : [
                {
                        "data" : 1,
                        "general_id" : 123
                },
                {
                        "data" : 2,
                        "general_id" : 456
                }
        ]
}
{
        "_id" : 2,
        "fruits" : [
                {
                        "data" : 1,
                        "general_id" : "xxx"
                },
                {
                        "data" : 2,
                        "general_id" : "yyy"
                }
        ]
}
{
        "_id" : 3,
        "fruits" : [
                {
                        "data" : 1,
                        "general_id" : "xxx"
                },
                {
                        "data" : 2,
                        "general_id" : "yyy"
                },
                {
                        "data" : 3,
                        "general_id" : 123
                },
                {
                        "data" : 4,
                        "general_id" : 456
                }
        ]
}

Adding an $unwind after that returns just the same as before. But the more complex operations are probably better suited to where you want to keep this as an array.

This time we removed the id and fruit_id by converting each array element into an array of "key/value" pairs via $objectToArray. We then $filter the array based on those "k" values, which are the names of the fields. The $arrayToObject makes this an object again, with all other content except those fields.

The $mergeObjects is to $map what $addFields is to the root "document", in that it takes multiple objects and "merges" them together. Thus the "filtered" object as described before, and the new object with only the general_id key and it's value translated from whichever field was present.

Lists of More than Two Fields

As a final note, $ifNull works better than $cond where you have just two values, but neither is actually that great if there is a larger possible list. You can nest $cond expressions or even use $switch, but really it's probably best to filter content out via the $objectToArray as seen before:

var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];

Model.aggregate([
  { "$unwind": "$fruits" },
  { "$group": {
    "_id": {
      "$arrayElemAt": [
        { "$map": {
          "input": {
            "$filter": {
              "input": { "$objectToArray": "$fruits" },
              "cond": { "$in": [ "$$this.k", valid_names ] }
            }
          },
          "in": "$$this.v"
        }},
        0
      ]
    },
    "count": { "$sum": 1 }
  }}  
])

That usually makes the most sense, otherwise for working with such a list in dynamic way you end up building aggregation pipeline stages in code, such as with using $switch would be:

var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];

var branches = valid_names.map(name => 
  ({
    "case": { "$gt": [`$fruits.${name}`, null ] },
    "then": `$fruits.${name}`
  })
)

Model.aggregate([
  { "$unwind": "$fruits" },
  { "$group": {
    "_id": { "$switch": { branches, "default": null } },
    "count": { "$sum": 1 }
  }}
])

Which looks cleaner in your code, but actually sends a much larger pipeline in BSON:

[
    { "$unwind" : "$fruits" },
    { "$group" : {
      "_id" : {
        "$switch" : {
          "branches" : [
           { 
             "case" : { "$gt" : [ "$fruits.id", null ] },
             "then" : "$fruits.id"
           },
           {
             "case" : { "$gt" : [ "$fruits.fruit_id", null ] },
             "then" : "$fruits.fruit_id"
           },
           {
             "case" : { "$gt" : [ "$fruits.apple_id", null ] },
             "then" : "$fruits.apple_id"
           },
           {
             "case" : { "$gt" : [ "$fruits.orange_id", null ] },
             "then" : "$fruits.orange_id"
           }
         ],
         "default" : null
        }
      },
      "count" : { "$sum" : 1 }
    }}
]

Upvotes: 1

Related Questions