Finch
Finch

Reputation: 73

How to implement version history for notes in a note taking database application?

So I've built myself a super simple notetaking application using a relational database (if you're curious, I've used Excel VBA + MySQL). The app works fantastically for me as a replacement for Evernote, but I had this other feature idea: Could I implement version control/history for each individual note?

To be clear I'm not talking about version control for the database's records or schema. I'm trying to make a user-facing (not developer) interface to take notes "back in time". So yes, this could be done quite easily by simply assigning a unique ID to each note “thread” in a sense where the thread contains running history of that note, but if possible I’d also like to compress this data as much as possible and only store the differences of what changed.

So for example, if I have a note with body:

“This is the note body. It’s a super long text” And I change it to:

“This is the note body. It’s a very long text”

I would like to not store all those character bytes all over again in the database, and instead somehow store only what changed (“super” -> “very”).

This is similar to how GIT works probably except I don’t need branching capabilities. Would anybody have any suggestions for algorithms on how to do this sort of thing? Thanks!

Upvotes: 3

Views: 771

Answers (2)

boo lean - Go FAT
boo lean - Go FAT

Reputation: 11

Click here for a link to the source of the quote below...

"Local Version Control Systems: It is one of the simplest forms and has a database that kept all the changes to files under revision control. RCS is one of the most common VCS tools. It keeps patch sets (differences between files) in a special format on disk. By adding up all the patches it can then re-create what any file looked like at any point in time."

You're not going to be able to store merely the changes and efficiently add them back up to re-create a previous version.

Original File: 1,2,3,4,5,6,7,7

File entry #2 1,2,3,4,5,6,6,7

store 7 == 6; ##this doesnt tell which 7 to replace

store orig.str[6] == 6; ##this requires parsing, ie. saving whole orig file

now .. assume we save the original file once in the very beginning... the first iteration of the concept would work... it points back to the original save and stores a change to it. When you pull the file version down in order to make changes and create version #3 you end up pulling the original file, then pulling the (first, second, ... 400th, nth) change in order to even get the nth version of the file... Assuming you make n changes to the file, you'd end up having to pull the original file, then pulling every single individual change, in order, so you could construct the nth file version

you would end up being able to push small little changes, and order them, but not only are you going to have to parse the file to create a tricky way of defining that change.. the real problem is going to be pulling it ... slow it will be to pull, pull too much you must. The LOCAL .. aspect is a must for doing this. But, it's doable. Just search up local versioning control systems

Upvotes: 1

Kombajn zbożowy
Kombajn zbożowy

Reputation: 10703

As a first choice, I would stick to store and version entire note as a whole, even if that's just one letter changed. It makes it simple - doesn't require to compute diffs on write and recontruct note on read. Storage is cheap and MySQL performance will surely suffice with small to medium amount of data.

[notes]
note_id  version  text
      1        1  This is the note body. It’s a super long text
      1        2  This is the note body. It’s a very long text
      1        3  This is the note body. It’s a really a very long text

I would only consider following options if you really expect huge number of users and notes, or maybe just doing this for educational purposes.

Instead of versioning notes as a whole you can split it into chunks - it might be paragraphs, sections or any other entity you can distinguish.

[sections]
section_id  text
         1  This is the note body.text
         2  It’s a super long text
         3  It’s a very long text
         4  It’s really a very long text

[notes]
note_id  version  position  section_id
      1        1         1           1
      1        1         2           2
      1        2         1           1
      1        2         2           3
      1        3         1           1
      1        3         2           4

Here notes and their versions reference to specific sections at specific postitions. See how section_id = 1 gets reused in subsequent versions. It also allows a section to be reused across different notes.

Or, as you suggested, you could try to store diffs. For example, using unified diff:

[notes]
note_id  version  text_or_diff
      1        1  This is the note body.
                  It’s a super long text
      1        2  @@ -1,2 +1,2 @@
                   This is the note body.
                  -It’s a super long text
                  +It’s a very long text
      1        3  @@ -1,2 +1,2 @@
                   This is the note body.
                  -It’s a very long text
                  +It’s really a very long text

Here of course the diff is longer than actual text of the note, but with bigger notes it will be more efficient. As mentioned, this comes at a cost - when reading such note you need to load all version records and apply the diffs.

From here you can explore various options and optimizations:

  • Use another diff format
  • Only store diff if it's shorter, otherwise just store full note
  • Split notes into chunks/sections and maintain chunk history as diffs

Upvotes: 4

Related Questions