Adrian Pop
Adrian Pop

Reputation: 1967

MongoDB aggregation query optimization: $match, $lookup and double $unwind

Let's say we have two collections:

Extra info:

My job is to create a query to generate a summary of all the existing cards in all of those devices, each entry being enriched with information from the interfaces collection. I also need to be able to parametrize the query (in case I'm interested only in a certain device with a certain name, only a certain model for cards etc.)

So far, I have this:

mongo_client.devices.aggregate([
    # Retrieve all the devices having the cards field
    {
        "$match": {
            # "name": "<device-name>",
            "cards": {
                "$exists": "true"
            }
        }
    },
    
    # Group current content with every cards object
    {
         "$unwind": "$cards"
    },
    
    # Only take the ones having "slot" a number
    {
        "$match": {
            "cards.slot": {
                "$regex": "^\d+$"
            }
        }
    },
    
    # Retrieve the device's interfaces
    {
        "$lookup": {
            "from": "interfaces",
            "let": {
                "owner": "$name",
            },
            "as": "interfaces",
            "pipeline": [{
                "$match": {
                    "$expr": {
                        "$eq": ["$owner", "$$owner"]
                    },
                },
            }]
        }
    },
    
    {
        "$unwind": "$interfaces"
    },
    
    {
        "$match": {
            "$expr": {
                "$eq": ["$interfaces.name", {
                    "$concat": ["s", "$cards.slot", "p1"]
                }]
            }
        }
    },
    
    # Build the final object
    {
        "$project": {
            # Card related fields
            "slot": "$cards.slot",
            "model": "$cards.model",

            
            # Device related fields
            "device_name": "$name",
           
            # Fields from interfaces
           "interface_field_x": "$interfaces.interface_field_x",
           "interface_field_y": "$interfaces.interface_field_y",
        }
    },
])

The query works and it's quite fast, but I have a question:

  1. Is there any way I can avoid the 2nd $unwind? If for every device there are 50-150 interface objects where owner is the name of that device, I feel that I'm slowing it down. Every device has a unique interface named s[slot]p1. How can I get that specific object in a better way? I tried to use two $eq expressions in the $match inside the $lookup or even $regex or $regexMatch, but I couldn't use the outside slot fields, even if I put it inside let.

  2. If I want to parametrize my query to filter the data if needed, would you add match expressions as intermediary steps or just filter at the end?

Any other improvements to the query are welcome. I'm also interested in how to make it errors proof (if by mistake cards is missing or that s1p1 interface is not found.

Thanks!

Upvotes: 0

Views: 588

Answers (1)

D. SM
D. SM

Reputation: 14530

Your question is missing sample data for the query, but:

  • Merge the third stage into the first stage, get rid of $exists
  • Instead of pipeline use localField+foreignField, pipeline is much slower

The number of unwinds in the query should correspond to what objects you want in the result set:

  • 0 unwinds for devices
  • 1 unwind for cards
  • 2 unwinds for interfaces

To match the desired conditions no unwinds are needed.

Upvotes: 1

Related Questions