pocorschi
pocorschi

Reputation: 3665

Python + MongoDB document versioning

I have an app in the works for use internally as a project/task tracker in the company that I'm working for. Playing around with MongoDB atm. I have the following pseudo-schema in mind:

task
    _id
    name
    project
    initial_notes
    versions
        number
        versions
            version_1
                worker
                status
                date(if submitted)
                review_notes(if rejected)
                reply_on(if accepted/rejected)
            (version_n)(if any)

The problem that I'm having is with versioning the task. I've read a number of possible ways but I'm falling short of understanding them all the way through. I read something that I liked here and really like the way mongoid does it's versioning

Thinking of it better I'd rather have it something like this

task
    _id
    versions
        number_of_versions: 3
        current_version
            version_no: 3
            worker: bob
            status: accepted
        old_versions
            version
                version_no: 2
                worker: bob

I would like to show the most recent version only when displaying a collection of tasks and I would like to show all versions of a particular task when entering the detailed information page for that particular task. Would this structure work? If yes, what would be some queries needed to run in order to achieve what I need?

Thank you in advance for your time reading this and maybe answering it. status: rejected version version_no: 1 worker: smith status: rejected

Upvotes: 4

Views: 2597

Answers (3)

Dash2TheDot
Dash2TheDot

Reputation: 157

I was dealing with the same issue, which is why I created HistoricalCollection:

https://pypi.org/project/historical-collection/

Works just like a normal collection, with the addition of a few additional methods:

  • patch_one()
  • patch_many()
  • find_revisions()
  • latest()

An example of the usage:

from historical_collection.historical import HistoricalCollection
from pymongo import MongoClient
class Users(HistoricalCollection):
    PK_FIELDS = ['username', ]  # <<= This is the only requirement

# ...

users = Users(database=db)

users.patch_one({"username": "darth_later", "email": "[email protected]"})
users.patch_one({"username": "darth_later", "email": "[email protected]", "laser_sword_color": "red"})

list(users.revisions({"username": "darth_later"}))

# [{'_id': ObjectId('5d98c3385d8edadaf0bb845b'),
#   'username': 'darth_later',
#   'email': '[email protected]',
#   '_revision_metadata': None},
#  {'_id': ObjectId('5d98c3385d8edadaf0bb845b'),
#   'username': 'darth_later',
#   'email': '[email protected]',
#   '_revision_metadata': None,
#   'laser_sword_color': 'red'}]

Upvotes: 0

dcrosta
dcrosta

Reputation: 26258

You might also consider a model like this:

task
    _id
    ...
    old_versions [
        {
            retired_on
            retired_by
            ...
        }
    ]

This has the advantage that your current data is always at the top level (you don't need to track the current version explicitly, the document is its own current version), and you can easily track history by taking the current version, removing the old_versions field, and $pushing it to the old_versions field in the db.

Since you seemed to want to minimize network IO, this also lets you easily avoid loading the old_versions when you don't need them:

> db.tasks.find({...}, {old_versions: 0})

You could also get fancier and store a list of old versions only of fields that have changed. This requires more delicate code in your application layer, and may not be necessary if you don't expect many revisions or for these documents to be very large.

Upvotes: 3

Zaur Nasibov
Zaur Nasibov

Reputation: 22659

Yes, why not. That scheme would work. Also, have you considered something like this:

task 
    ...
    versions = [       # MongoDB array 
        {   version_id 
            worker
            status
            date(if submitted)
            review_notes(if rejected)
            reply_on(if accepted/rejected)
        },
        { version_id : ... }, 
        ... 

Possible version insertion query:

tasks.update( { # a query to locate a particular task}, 
              { '$push' : { 'versions', { # new version } } } )

Note, that retrieving the last version from the versions array in this case is done by the program, not by Mongo.

Upvotes: 4

Related Questions