Alvaro Luis Bustamante
Alvaro Luis Bustamante

Reputation: 8432

MongoDB skip stages on pipeline?

I want to know if there is any way of skipping stages on the aggregation pipeline, more concretely, stop and return if one of the $lookup stages find a matach.

I need a query for retrieving "inherited" data from other types and/or groups. In this case I have three different tables: devices_properties, types_properties, and group_properties, where are stored properties for each device, type, or group.

If a device has a property defined, i.e., geofences, it can be read directly from devices_properties, if not , it is necessary to check its type and/or its group to see if it is defined there. If it is found on its type, then it is not necessary to check in the group.

I have a query that works by checking its type/group, and doing a $lookup over the different tables. Then, with a switch, it returns the appropriate document. However, it is not optimal, as many times the property will be located on the first table: devices_properties. In such case, it does 3 unnecessary lookups, as it is not required to check for device type and group, and check for their respective properties. Not sure I explained it correctly.

The query I have right know is the following. Any way to optimize it? i.e., stop after the first $lookup if there is match?.

db.devices.aggregate([
    {"$match" : { "_id": "alvarolb@esp32"}},
    {"$project" : {
        "_id": false,
        "asset_group": {"$concat" : ["alvarolb", "@", "$asset_group", ":", "geofences"]},
        "asset_type": {"$concat" : ["alvarolb", "@", "$asset_type", ":", "geofences"]}
     }},
     {"$lookup" : {
        "from": "devices_properties",
        "pipeline": [ 
            {"$match" : {"_id": "alvarolb@esp32:geofences"}},
        ],
        "as": "device"
    }},
    { "$unwind": {
        "path": "$device",
        "preserveNullAndEmptyArrays": true
    }},
    {"$lookup" : {
        "from": "groups_properties",
        "let" : {"asset_group" : "$asset_group"},
        "pipeline": [ 
            {"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_group"]}}}
        ],
        "as": "group"
    }},
    { "$unwind": {
        "path": "$group",
        "preserveNullAndEmptyArrays": true
    }},
    {"$lookup" : {
        "from": "types_properties",
        "let" : {"asset_type" : "$asset_type"},
        "pipeline": [ 
            {"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_type"]}}}
        ],
        "as": "type"
    }},
    { "$unwind": {
        "path": "$type",
        "preserveNullAndEmptyArrays": true
    }},
    {"$project" : {
        "value": {
            "$switch" : {
                "branches" : [
                    {"case": "$device", "then" : "$device"},
                    {"case": "$type", "then" : "$type"},
                    {"case": "$group", "then" : "$group"}
                ],
                "default": {}
            }
        }
    }},
    {"$replaceRoot": { "newRoot": "$value"}}
]);

Thanks!

Upvotes: 0

Views: 1428

Answers (1)

Alex Blex
Alex Blex

Reputation: 37048

I doubt this particular query requires optimisation but conditional stages in aggregation pipeline in general is an interesting question.

So first thing first, on the first stage you select at most 1 document by indexed field which is already quite optimal. All your lookups do the same so we are talking about magnitude of few dozen millis for the whole pipeline even on large collections. Is it worth optimising?

For more generic case when lookups are indeed expensive you can employ a combination of $facet to run conditional pipelines and $concatArrays to merge the results.

The first lookup remains as is:

db.devices.aggregate([
     ....
     {"$lookup" : {
        "from": "devices_properties",
        "pipeline": [ 
            {"$match" : {"_id": "alvarolb@esp32:geofences"}},
        ],
        "as": "device"
    }},

Then we add an indicator whether it returned any result so we need no more lookups:

{$addFields:{found: {$size: "$device"}}},

Then we define 2 pipelines in the facet: one with next lookup, another without. The switch which one to run is the first $match stage in each pipeline:

{$facet:{
    yes:[
        {$match: {"$expr" : {$gt:["$found", 0]}}},
    ],
    no:[
        {$match: {"$expr" : {$eq:["$found", 0]}}},
        {"$lookup" : {
            "from": "groups_properties",
            "let" : {"asset_group" : "$asset_group"},
            "pipeline": [ 
                {"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_group"]}}}
            ],
            "as": "group"
        }}
    ]
}},

after this stage we have 2 arrays "yes" and "no", one of them is always empty. Merge both and convert to top-level documents:

{$addFields: {yesno: {$concatArrays:["$yes", "$no"]}}},
{$unwind: "$yesno"},
{"$replaceRoot": { "newRoot": "$yesno"}},

recalculate the indicator if we have found anything so far:

{$addFields:{found: {$add: [ "$found", {$size: {$ifNull:["$group", []]}}]}}},

and repeat the same technique for the next lookup:

$facet with $lookup in `groups_properties`  
$addFields with $concatArrays 
$unwind
$replaceRoot

then do you types_properties in the similar fashion and finalise it projection/replace root as in the original pipeline.

Upvotes: 2

Related Questions