Eric
Eric

Reputation: 5091

django database synchronization for an offline usage

I have one master django sever where the data are stored (mysql database).

Online : I would like many users to have a copy from this database synchronized (only delta's must be copied) on their laptops (sqlLite DB)

Offline (users do not have access to the master server) : users can view and update their local database.

Back to Online : what has been modified on users laptops is synchronized back to the master django server.

I think,as I have 2 kind of database, I need to synchronize at django object level. Is there a django application doing that ? If not, how will you procced to code such a feature ?

Upvotes: 20

Views: 5287

Answers (3)

Samuele Mattiuzzo
Samuele Mattiuzzo

Reputation: 11038

Well, I actually don't know if there's a django app to do that, but I will proceed like this:

create a method for "offline_update": connection to the server's database, you select all objects whose id matches the one in the local database. you update the local database. then you select the rest of the entries, and you add them to local database

create a method for "online_update" the same routine, inverted.

PRO: easy to implement (Objects.all() gets you everything, then you manipulate and update, or save directly)

CONS: race conditions (what if 2 users update the same entry (not necessarily on the same time)? who's got the most updated one?)

you basically create a sort of "mysql-svn" to keep the 2 databases updated.

I'm voting +1 to your question, because it's really interesting. I've always worked by dumping the database (via mysql) and then loading into the local database. without using django.

Upvotes: 1

nmgeek
nmgeek

Reputation: 2187

I built a Django app which does this. When model instances are created on the remote/laptop version of the app they are marked as dirty and get a temporary id. The remote app checks regularly for connectivity to the master server. When there is a network connection, i.e. the app is online, it gets a permanent id for each new dirty model instance from the master server. The temporary ids are replaced with permanent ids and then the dirty instances are synchronized to the master.

I used the Django REST framework to receive and update the dirty model instances on the master server.

Note that this also requires running a local web server on the offline computer. I chose CherryPy for that.

Upvotes: 1

Turns out that I'm running a system like this in Django.

This is not a complete answer, just the answer that is currently solving (mostly) the problem.

  • Use of UUIDs for primary keys. That decreases greatly primary keys collision for diferent objects.
  • Use Django's serialization framework for data interchange. The central admin site has an option to download the selected objects in the changelist to a Django-compatible serialized file. Then the user can go offline and start a local admin site, and there, upload the serialized file. When finished offline edition, the same process is used, in the "offline" admin site the objects are serialized to a file, and the uploaded to the central admin site.
  • The serialization frameworks is very useful, since you can get an actual (and unsaved) object, then decide to save it or not, and to modify some fields before the save.

We have run into very little trouble with this simple system, also helped since the content is properly categorized and the editors only create/edit a non overlapped set of categories.

I have talked with this with some people, and proposed me several solutions:

  • Use an timestamp field: That help decide which version to save and one to discard.
  • Use a version fields, with mayor and minor version numbers. Minor editing (like spelling corrections) only updates the minor version number, and major changes updates the mayor version number and sets the minor to 0. That way when comparing you always know which one gets higher priority. However this needs education and conventions within the editing users.
  • Object updates. A separate model, which stores updates coming from offline edits. Then a 'chief' editor merges them into the actual object, helped with some additional admin views to view differences (using google-diff-match-patch and the like). An object can also be flagged to allow direct updates, that is, no storing updates and apply them directly on arrival. The inconvenience is the 'chief' editor has to review the all the updates, and that depends on how much information get updated.

Hope this helps in some way. If anyone decides to implement something of this, I'll love to hear from him.

Upvotes: 5

Related Questions