Reputation: 8616

How to update a document with a reference to its previous state?

Is it possible to reference the root document during an update operation such that a document like this:

{"name":"foo","value":1}

can be updated with new values and have the full (previous) document pushed into a new field (creating an update history):

{"name":"bar","value":2,"previous":[{"name:"foo","value":1}]}

And so on..

{"name":"baz","value":3,"previous":[{"name:"foo","value":1},{"name:"bar","value":2}]}

I figure I'll have to use the new aggregate set operator in Mongo 4.2, but how can I achieve this?

Ideally I don't want to have to reference each field explicitly. I'd prefer to push the root document (minus the _id and previous fields) without knowing what the other fields are.

Upvotes: 3

Answers (3)

Sourav De

Reputation: 99

 addMultipleData: (req, res, next) => {
        let name = req.body.name ? req.body.name : res.json({ message: "Please enter Name" });
        let value = req.body.value ? req.body.value : res.json({ message: "Please Enter Value" });
        if (!req.body.name || !req.body.value) { return; }
       //Step 1
        models.dynamic.findOne({}, function (findError, findResponse) {
            if (findResponse == null) {
                let insertedValue = {
                    name: name,
                    value: value
                }
                //Step 2
                models.dynamic.create(insertedValue, function (error, response) {
                    res.json({
                        message: "succesfully inserted"
                    })
                })
            }
            else {
                let pushedValue = {
                    name: findResponse.name,
                    value: findResponse.value
                }
                let updateWith = {
                    $set: { name: name, value: value },
                    $push: { previous: pushedValue }
                }
                let options = { upsert: true }
                //Step 3
                models.dynamic.updateOne({}, updateWith, options, function (error, updatedResponse) {
                    if (updatedResponse.nModified == 1) {
                        res.json({
                            message: "succesfully inserted"
                        })
                    }
                })
            }
        })
    }
  //This is the schema
   var multipleAddSchema = mongoose.Schema({
     "name":String,
     "value":Number,
     "previous":[]
    })

Upvotes: 0

Markus W Mahlberg

Reputation: 20703

Imho, you are making your life indefinitely more complex for no reason with such complicated data models.

Think of what you really want to achieve. You want to correlate different values in one or more interconnected series which are written to the collection consecutively.

Storing this in one document comes with some strings attached. While it seems to be reasonable in the beginning, let me name a few:

How do you get the most current document if you do not know it's value for name?
How do you deal with very large series, which make the document hit the 16MB limit?
What is the benefit of the added complexity?

Simplify first

So, let's assume you have only one series for a moment. It gets as simple as

[{
  "_id":"foo",
  "ts": ISODate("2019-07-03T17:40:00.000Z"),
  "value":1
},{
  "_id":"bar",
  "ts": ISODate("2019-07-03T17:45:00.000"),
  "value":2
},{
  "_id":"baz",
  "ts": ISODate("2019-07-03T17:50:00.000"),
  "value":3
}]

Assuming the name is unique, we can use it as _id, potentially saving an index.

You can actually get the semantic equivalent by simply doing a

> db.seriesa.find().sort({ts:-1})
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "_id" : "foo", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }

Say you only want to have the two latest values, you can use limit():

> db.seriesa.find().sort({ts:-1}).limit(2)
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }

Should you really need to have the older values in a queue-ish array

db.seriesa.aggregate([{
  $group: {
    _id: "queue",
    name: {
      $last: "$_id"
    },
    value: {
      $last: "$value"
    },
    previous: {
      $push: {
        name: "$_id",
        value: "$value"
      }
    }
  }
}, {
  $project: {
    name: 1,
    value: 1,
    previous: {
      $slice: ["$previous", {
        $subtract: [{
          $size: "$previous"
        }, 1]
      }]
    }
  }
}])

Nail it

Now, let us say you have more than one series of data. Basically, there are two ways of dealing with it: put different series in different collections or put all the series in one collection and make a distinction by a field, which for obvious reasons should be indexed.

So, when to use what? It boils down wether you want to do aggregations over all series (maybe later down the road) or not. If you do, you should put all series into one collection. Of course, we have to slightly modify our data model:

[{
  "name":"foo",
  "series": "a"
  "ts": ISODate("2019-07-03T17:40:00.000Z"),
  "value":1
},{
  "name":"bar",
  "series": "a"
  "ts": ISODate("2019-07-03T17:45:00.000"),
  "value":2
},{
  "name":"baz",
  "series": "a"
  "ts": ISODate("2019-07-03T17:50:00.000"),
  "value":3
},{
  "name":"foo",
  "series": "b"
  "ts": ISODate("2019-07-03T17:40:00.000Z"),
  "value":1
},{
  "name":"bar",
  "series": "b"
  "ts": ISODate("2019-07-03T17:45:00.000"),
  "value":2
},{
  "name":"baz",
  "series": "b"
  "ts": ISODate("2019-07-03T17:50:00.000"),
  "value":3
}]

Note that for demonstration purposes, I fell back for the default ObjectId value for _id.

Next, we create an index over series and ts, as we are going to need it for our query:

> db.series.ensureIndex({series:1,ts:-1})

And now our simple query looks like this

> db.series.find({"series":"b"},{_id:0}).sort({ts:-1})
{ "name" : "baz", "series" : "b", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "name" : "bar", "series" : "b", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "name" : "foo", "series" : "b", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }

In order to generate the queue-ish like document, we need to add a match state

> db.series.aggregate([{
    $match: {
      "series": "b"
    }
  },
  // other stages omitted for brevity
  ])

Note that the index we created earlier will be utilized here.

Or, we can generate a document like this for every series by simply using series as the _id in the $group stage and replace _id with name where appropriate

db.series.aggregate([{
  $group: {
    _id: "$series",
    name: {
      $last: "$name"
    },
    value: {
      $last: "$value"
    },
    previous: {
      $push: {
        name: "$name",
        value: "$value"
      }
    }
  }
}, {
  $project: {
    name: 1,
    value: 1,
    previous: {
      $slice: ["$previous", {
        $subtract: [{
          $size: "$previous"
        }, 1]
      }]
    }
  }
}])

Conclusion

Stop Being Clever when it comes to data models in MongoDB. Most of the problems with data models I saw in the wild and the vast majority I see on SO come from the fact that someone tried to be Smart (by premature optimization) ™.

Unless we are talking of ginormous series (which can not be, since you settled for a 16MB limit in your approach), the data model and queries above are highly efficient without adding unneeded complexity.

Upvotes: 0

Xavier Guihot

Reputation: 61666

In addition to the new $set operator, what makes your use case really easier with Mongo 4.2 is the fact that db.collection.update() now accepts an aggregation pipeline, finally allowing the update of a field based on its current value:

// { name: "foo", value: 1 }
db.collection.update(
  {},
  [{ $set: {
     previous: {
       $ifNull: [
         { $concatArrays: [ "$previous", [{ name: "$name", value: "$value" }] ] },
         [ { name: "$name", value: "$value" } ]
       ]
     },
     name: "bar",
     value: 2
  }}],
  { multi: true }
)
// { name: "bar", value: 2, previous: [{ name: "foo", value: 1 }] }
// and if applied again:
// { name: "baz", value: 3, previous: [{ name: "foo", value: 1 }, { name: "bar", value: 2 } ] }

The first part {} is the match query, filtering which documents to update (in our case probably all documents).
The second part [{ $set: { previous: { $ifNull: [ ... } ] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
- $set is a new aggregation operator and an alias of $addFields. It's used to add/replace a new field (in our case "previous") with values from the current document.
- Using an $ifNull check, we can determine whether "previous" already exists in the document or not (this is not the case for the first update).
- If "previous" doesn't exist (is null), then we have to create it and set it with an array of one element: the current document: [ { name: "$name", value: "$value" } ].
- If "previous" already exist, then we concatenate ($concatArrays) the existing array with the current document.
Don't forget { multi: true }, otherwise only the first matching document will be updated.

As you mentioned "root" in your question and if your schema is not the same for all documents (if you can't tell which fields should be used and pushed in the "previous" array), then you can use the $$ROOT variable which represents the current document and filter out the "previous" array. In this case, replace both { name: "$name", value: "$value" } from the previous query with:

{ $arrayToObject: { $filter: {
     input: { $objectToArray: "$$ROOT" },
     as: "root",
     cond: { $ne: [ "$$root.k", "previous" ] }
}}}

Upvotes: 2

How to update a document with a reference to its previous state?

Answers (3)

Simplify first

Nail it

Conclusion

Related Questions