MalcolmOcean
MalcolmOcean

Reputation: 2985

Get size in bytes of document in aggregation

I know about Object.bsonsize() but I'm not sure it'll work in my case. I'm wanting to efficiently figure out which users on my app have the most total data, and I have an aggregation pipeline that uses $lookup to collect all of the user's documents (scattered through 3 other collections) together. I then want a pipeline stage that looks something like:

$project: {
    "_id": 1,
    "username": 1, 
    "sizeInBytes": {
        $sizeInBytes: ...
    }
}

I'm pretty new to aggregation, so not actually sure what I'd want after sizeInBytes, to reference the whole document not just a property.

It looks like maybe in MongoDB 4.0+, this could be done using $toString and then $strLenBytes but I'm surprised I can't find a built-in way to do this much more directly. (And unfortunately I'm stuck on 3.6 atm)

Upvotes: 1

Views: 817

Answers (1)

buræquete
buræquete

Reputation: 14678

Sadly even with MongoDB 4.0+, it is very messy to calculate size, as you speculated, String length can be used. There is an open ticket pending for a possible future feature within the aggregate pipeline.

With what you have, I suggest using Javascript on the result of your $lookup result.

Something like this for example;

db.user.aggregate([
  {
    $lookup: {
      from: "doc1",
      localField: "userId",
      foreignField: "userId",
      as: "doc1arr"
    }
  },
  {
    $lookup: {
      from: "doc2",
      localField: "userId",
      foreignField: "userId",
      as: "doc2arr"
    }
  },
  {
    $lookup: {
      from: "doc3",
      localField: "userId",
      foreignField: "userId",
      as: "doc3arr"
    }
  }
]).map(perUserData => ({ userId: perUserData.userId, size: Object.bsonsize(perUserData) }));

will give out something like;

[
    {
        "userId" : 1,
        "size" : 250
    },
    {
        "userId" : 2,
        "size" : 350
    }
]

See the non-js part on mongoplayground

Upvotes: 3

Related Questions