black_sheep07
black_sheep07

Reputation: 2368

Combining Unique Items From Arrays

I have a data set that I am querying. The data looks like this:

db.activity.insert(
    {
        "_id" : ObjectId("5908e64e3b03ca372dc945d5"),
        "startDate" : ISODate("2017-05-06T00:00:00Z"),
        "details" : [
            {
                "code" : "2",
                "_id" : ObjectId("5908ebf96ae5003a4471c9b2"),
                "walkDistance" : "03",
                "jogDistance" : "01",
                "runDistance" : "08",
                "sprintDistance" : "01"
            }
        ]
    }
)

db.activity.insert(
    {
        "_id" : ObjectId("58f79163bebac50d5b2ae760"),
        "startDate" : ISODate("2017-05-07T00:00:00Z"),
        "details" : [
            {
                "code" : "2",
                "_id" : ObjectId("58f7948fbebac50d5b2ae7f2"),
                "walkDistance" : "01",
                "jogDistance" : "02",
                "runDistance" : "09",
                "sprintDistance" : ""
            }
        ]
    }
)

My desired output looks as such:

[
  {
    "_id": null,
    "uniqueValues": [
      "03",
      "01",
      "08",
      "02",
      "09"
    ]
  }
]

In order to do that, I've developed the following code:

db.activity.aggregate([
    {
        $facet: {
            "walk": [
                {$unwind: '$details'},
                {$group: {_id: null, uniqueValues: {$addToSet: "$details.walkDistance"}}}
            ], "jog": [
                {$unwind: '$details'},
                {$group: {_id: null, uniqueValues: {$addToSet: "$details.jogDistance"}}}
            ], "run": [
                {$unwind: '$details'},
                {$group: {_id: null, uniqueValues: {$addToSet: "$details.runDistance"}}}
            ], "sprint": [
                {$unwind: '$details'},
                {$group: {_id: null, uniqueValues: {$addToSet: "$details.sprintDistance"}}}
            ]
        }
    }])

However, I am still getting 4 different facets with their own _id: null and uniqueValues array. How do I change the query so that they all included in a single array, and the "" is also excluded.

Upvotes: 3

Views: 3569

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151112

$facet really is not the best thing to use here. You should really just be applying $concatArrays and filtering down the result with $setDifference and $filter:

db.activity.aggregate([
  { "$project": {
    "_id": 0,
    "unique": {
      "$filter": {
        "input": {
          "$setDifference": [
            { "$concatArrays": [ 
              "$details.walkDistance",
              "$details.jogDistance",
              "$details.runDistance",
              "$details.sprintDistance"
            ]},
            []
          ]
        },
        "cond": { "$ne": [ "$$this", "" ] }
      }
    }
  }},
  { "$unwind": "$unique" },
  { "$group": {
    "_id": null,
    "uniqueArray": { "$addToSet": "$unique" }  
  }}
])

Returns the result:

/* 1 */
{
    "_id" : null,
    "uniqueArray" : [ 
        "09", 
        "03", 
        "01", 
        "02", 
        "08"
    ]
}

So after bringing all the array values into a single array using $concatArrays, you apply $setDifference to reduce the list to the "unique" values. The $filter removes the "" values you don't want.

Then it's just a matter of applying $unwind on the singular and reduced list and bringing it back together in the $group with $addToSet to only keep unique values across documents.

You could also just $concatArrays only and then $unwind and $match, but the other operators don't really cost much and reduce some of the load by already narrowing down to "unique" within the document before you get to the $unwind. So it's better to do it that way.

Really this can even be broken down futher, to simply $setUnion and $setDifference since we are talking about "sets" afterall:

db.activity.aggregate([
  { "$project": {
    "_id": 0,
    "unique": {
      "$setDifference": [
        { "$setUnion": [ 
          "$details.walkDistance",
          "$details.jogDistance",
          "$details.runDistance",
          "$details.sprintDistance"
        ]},
        [""]
      ]
    }
  }},
  { "$unwind": "$unique" },
  { "$group": {
    "_id": null,
    "uniqueArray": { "$addToSet": "$unique" }  
  }}
])

And that means that the overall statement becomes compatible back to MongoDB 2.6, or would be if all the forms such as $details.walkDistance were written out in their longer form using $map:

  "$setDifference": [
    { "$setUnion": [ 
      { "$map": { "input": "$details", "as": "d", "in": "$$d.walkDistance" } },
      { "$map": { "input": "$details", "as": "d", "in": "$$d.jogDistance" } },
      { "$map": { "input": "$details", "as": "d", "in": "$$d.runDistance" } },
      { "$map": { "input": "$details", "as": "d", "in": "$$d.sprintDistance" } }
    ]},
    [""]
  ]

On the other hand running $facet causes a "brute force" parse through the whole collection for every property from within the array, and $unwind being processed on each of those passes. So it's a really inefficient way to obtain the result. So don't do it that way.

Upvotes: 6

Related Questions