Flo Woo
Flo Woo

Reputation: 989

Output from aggregate array of objects to document keys

I have some trade data like this

{
    "_id" : 1498290900.0,
    "trade" : {
        "type" : "Modify",
        "data" : {
            "type" : "bid",
            "rate" : "0.00658714",
            "amount" : "3.82354427"
        },
        "date" : 1498290930291.0,
        "name" : "TLX"
    }
},{
    "_id" : 1498290900.0,
    "trade" : {
        "type" : "Modify",
        "data" : {
            "type" : "ask",
            "rate" : "0.00658714",
            "amount" : "3.82354427"
        },
        "date" : 1498290930291.0,
        "name" : "TLX"
    }
},{
    "_id" : 1498290900.0,
    "trade" : {
        "type" : "Remove",
        "data" : {
            "type" : "ask",
            "rate" : "0.00680891"
        },
        "date" : 1498290931349.0,
        "name" : "TLX"
     }
}

These come from $rewind hence the _id being same. Want i want to do next is group them on _id so i try

{ 
    $group: { 
      _id: {_id: "$_id", name: "$trade.type",dtype: "$trade.data.type"},
        count          : {$sum: 1}
    },
  },{$project: { _id: "$_id._id", type: "$_id.name", count: 1, dtype: "$_id.dtype" } },
  {

      $group: {
          _id: "$_id",
          results: { $push : "$$ROOT" }
          }
  }

Which is quite good, give me the below

 {
    "_id" : 1498276800.0,
    "results" : [ 
        {
            "count" : 16.0,
            "_id" : 1498276800.0,
            "type" : "Modify",
            "dtype" : "bid"
        }, 
        {
            "count" : 15.0,
            "_id" : 1498276800.0,
            "type" : "Remove",
            "dtype" : "bid"
        }, 
        {
            "count" : 3.0,
            "_id" : 1498276800.0,
            "type" : "Remove",
            "dtype" : "ask"
        }, 
        {
            "count" : 1.0,
            "_id" : 1498276800.0,
            "type" : "Modify",
            "dtype" : "ask"
        }
    ]
}

But i was trying to make the output more like this

  {
    "_id" : 1498276800.0,
    "Modify": {
      "bid":{
        "count": 16.0
      },
      "ask": {
        "count": 1.0
      }
    },
    "Remove": {
      "bid":{
        "count": 15.0
      },
      "ask": {
        "count": 3.0
      }
    }
  }

But no amount of playing with $projections has got me close.

Can anyone point me in the right direction please?

thanks.

UPDATE

Excluding last pipeline stage, this is example documents with nice bid/ask per type ready to be grouped by _id.

{
    "_id" : {
        "_id" : 1498276800.0,
        "type" : "orderBookRemove"
    },
    "results" : [ 
        {
            "k" : "bid",
            "v" : {
                "count" : 15.0
            }
        }, 
        {
            "k" : "ask",
            "v" : {
                "count" : 3.0
            }
        }
    ]
},
{
    "_id" : {
        "_id" : 1498276800.0,
        "type" : "orderBookModify"
    },
    "results" : [ 
        {
            "k" : "bid",
            "v" : {
                "count" : 16.0
            }
        }, 
        {
            "k" : "ask",
            "v" : {
                "count" : 1.0
            }
        }
    ]
}

When last part of pipeline is applied i.e

{ "$group": {
    "_id": "$_id._id",
    "results": {
      "$push": {
        "k": "$_id.type",
        "v": "$results"
      }
    }
  }}

I get this, only first 'bid' element of results array. 2nd item 'ask' goes AWOL ?

{
    "_id" : 1498280700.0,
    "results" : [ 
        {
            "k" : "orderBookRemove",
            "v" : [ 
                {
                    "k" : "bid",
                    "v" : {
                        "count" : 9.0
                    }
                }
            ]
        }, 
        {
            "k" : "orderBookModify",
            "v" : [ 
                {
                    "k" : "bid",
                    "v" : {
                        "count" : 6.0
                    }
                }
            ]
        }
    ]
}

Upvotes: 1

Views: 78

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151190

This entirely depends on the MongoDB version you have available, or not really, depending on how you look at it. As you say the data is actually from an array originally so I am going to start in that format, and process each option from there.

The source being considered then being:

{
        "_id" : ObjectId("594f3a530320738061df3eea"),
        "data" : [
                {
                        "_id" : 1498290900,
                        "trade" : {
                                "type" : "Modify",
                                "data" : {
                                        "type" : "bid",
                                        "rate" : "0.00658714",
                                        "amount" : "3.82354427"
                                },
                                "date" : 1498290930291,
                                "name" : "TLX"
                        }
                },
                {
                        "_id" : 1498290900,
                        "trade" : {
                                "type" : "Modify",
                                "data" : {
                                        "type" : "ask",
                                        "rate" : "0.00658714",
                                        "amount" : "3.82354427"
                                },
                                "date" : 1498290930291,
                                "name" : "TLX"
                        }
                },
                {
                        "_id" : 1498290900,
                        "trade" : {
                                "type" : "Remove",
                                "data" : {
                                        "type" : "ask",
                                        "rate" : "0.00680891"
                                },
                                "date" : 1498290931349,
                                "name" : "TLX"
                        }
                }
        ]
}

MongoDB 3.4

Simply use $replaceRoot and $arrayToObject with some careful placement of results:

db.dtest.aggregate([
  { "$unwind": "$data" },
  { "$group": { 
   "_id": {
     "_id": "$data._id", 
     "type": "$data.trade.type",
     "dtype": "$data.trade.data.type"
   },
   "count": { "$sum": 1 }
  }},
  { "$group": {
    "_id": {
      "_id": "$_id._id",
      "type": "$_id.type"
    },
    "results": {
      "$push": {
        "k": "$_id.dtype",
        "v": {
          "count": "$count"
        }
      }         
    }
  }},
  { "$group": {
    "_id": "$_id._id",
    "results": {
      "$push": {
        "k": "$_id.type",
        "v": "$results"
      }
    }
  }},
  { "$replaceRoot": {
    "newRoot": {
      "$arrayToObject": {
        "$concatArrays": [
          [{ "k": "_id", "v": "$_id" }],
          { "$map": {
            "input": "$results",
            "as": "r",
            "in": {
              "k": "$$r.k",
              "v": { "$arrayToObject": "$$r.v" }
            }
          }}
        ]
      }
    }
  }}
])

All versions

Which is what probably makes the most sense in most cases where we simply do the aggregated array form and do the transformation in the client instead. We don't really need additional aggregation since that part is already done, so it's not like we are reducing the data any further.

Simple to do in most languages, but as a basic JavaScript concept that works in the shell:

db.dtest.aggregate([
  { "$unwind": "$data" },
  { "$group": { 
   "_id": {
     "_id": "$data._id", 
     "type": "$data.trade.type",
     "dtype": "$data.trade.data.type"
   },
   "count": { "$sum": 1 }
  }},
  { "$group": {
    "_id": {
      "_id": "$_id._id",
      "type": "$_id.type"
    },
    "results": {
      "$push": {
        "k": "$_id.dtype",
        "v": {
          "count": "$count"
        }
      }         
    }
  }},
  { "$group": {
    "_id": "$_id._id",
    "results": {
      "$push": {
        "k": "$_id.type",
        "v": "$results"
      }
    }
  }}
]).map(doc => 
  doc.results.map(r => 
    ({ k: r.k, v: r.v.reduce((acc,curr) =>
      Object.assign(acc, { [curr.k]: curr.v }),{}) 
    })
  ).reduce((acc,curr) => 
    Object.assign(acc, { [curr.k]: curr.v }),{ _id: doc._id })
)

Which is essentially doing the same thing whilst processing the cursor as out new fancy pipeline stage is doing for each document.

So it really just goes to show that unless you intend to aggregate the results further along than from even this result, it simply is not necessary to use the fancy new operators. The same thing is achieved in what is considerably less lines of code and is far less terse to express.

Both output the same thing:

    {
            "_id" : 1498290900,
            "Modify" : {
                    "ask" : {
                            "count" : 1
                    },
                    "bid" : {
                            "count" : 1
                    }
            },
            "Remove" : {
                    "ask" : {
                            "count" : 1
                    }
            }
    }

Debug - to remove

Taking the data from your update I apply this:

db.test.aggregate([
  { "$group": {
    "_id": "$_id._id",
    "results": {
      "$push": {
        "k": "$_id.type",
        "v": "$results"
      }
    }
  }},
  { "$replaceRoot": {
    "newRoot": {
      "$arrayToObject": {
        "$concatArrays": [
          [{ "k": "_id", "v": "$_id" }],
          { "$map": {
            "input": "$results",
            "as": "r",
            "in": {
              "k": "$$r.k",
              "v": { "$arrayToObject": "$$r.v" }
            }
          }}
        ]
      }
    }
  }}
 ])

And get the expected output:

{
    "_id" : 1498276800.0,
    "orderBookRemove" : {
        "bid" : {
            "count" : 15.0
        },
        "ask" : {
            "count" : 3.0
        }
    },
    "orderBookModify" : {
        "bid" : {
            "count" : 16.0
        },
        "ask" : {
            "count" : 1.0
        }
    }
}

So your claimed output is false, and you are not following the example.

Upvotes: 1

Related Questions