Tom
Tom

Reputation: 542

Should id or timestamp be used to determine the creation order of rows within a database table? (given possibility of incorrectly set system clock)

A database table is used to store editing changes to a text document.

The database table has four columns: {id, timestamp, user_id, text}

A new row is added to the table each time a user edits the document. The new row has an auto-incremented id, and a timestamp matching the time the data was saved.

To determine what editing changes a user made during a particular edit, the text from the row inserted in response to his or her edit is compared to the text in the previously inserted row.

To determine which row is the previously inserted row, either the id column or the timestamp column could be used. As far as I can see, each method has advantages and disadvantages.

Determining the creation order using id

Determining the creation order using timestamp

I seek a strong argument for choosing one method over the other, or a description of another method that is better than the two I am considering.

Upvotes: 3

Views: 2832

Answers (3)

nickhar
nickhar

Reputation: 20823

Using the sequential id would be simpler as it's probably(?) a primary key and thus indexed and quicker to access. Given that you have user_id, you can quickly assertain the last and prior edits.

Using the timestamp is also applicable, but it's likely to be a longer entry, and we don't know if it's indexed at all, plus the potential for collisions. You rightly point out that system clocks can change... Whereas sequential id's cannot.

Given your update:

As it's difficult to see what your exact requirements are, I've included this as evidence of what a particular project required for 200K+ complex documents and millions of revisions.

From my own experience (building a fully auditable doc/profiling system) for an internal team of more than 60 full-time researchers. We ended up using both an id and a number of other fields (including timestamp) to provide audit-trailing and full versioning.

The system we built has more than 200 fields for each profile and thus versioning a document was far more complex than just storing a block of changed text/content for each one; Yet, each profile could be, edited, approved, rejected, rolled-back, published and even exported as either a PDF or other format as ONE document.

What we ended up doing (after a lot of strategy/planning) was to store sequential versions of the profile, but they were keyed primarily on an id field.

Timestamps

Timestamps were also captured as a secondary check and we made sure of keeping system clocks accurate (amongst a cluster of servers) through the use of cron scripts that checked the time-alignment regularly and corrected them where necessary. We also used Ntpd to prevent clock-drift.

Other captured data

Other data captured for each edit also included (but not limited to):

User_id
User_group
Action
Approval_id

There were also other tables that fulfilled internal requirements (including automatically generated annotations for the documents) - as some of the profile editing was done using data from bots (built using NER/machine learning/AI), but with approval being required by one of the team before edits/updates could be published.

An action log was also kept of all user actions, so that in the event of an audit, one could look at the actions of an individual user - even when they didn't have the permissions to perform such an action, it was still logged.

With regard to migration, I don't see it as a big problem, as you can easily preserve the id sequences in moving/dumping/transferring data. Perhaps the only issue being if you needed to merge datasets. You could always write a migration script in that event - so from a personal perspective I consider that disadvantage somewhat diminished.

It might be worth looking at the Stack Overflow table structures for there data explorer (which is reasonably sophisticated). You can see the table structure here: https://data.stackexchange.com/stackoverflow/query/new, which comes from a question on meta: How does SO store revisions?

As a revision system, SO works well and the markdown/revision functionality is probably a good example to pick over.

Upvotes: 1

Nick.Mc
Nick.Mc

Reputation: 19184

Or add another column whose sole purpose is to record the editing order. I suggest you do not use datetime for this.

Upvotes: 1

Bohemian
Bohemian

Reputation: 425003

Use Id. It's simple and works.

The only caveat is if you routinely add rows from a store-and-forward server so rows may be added later but should treated as being added earlier

Upvotes: 1

Related Questions