Reputation: 713

Is there any reason the Cassandra LWW 'timestamp' has to be a real timestamp?

When inserting data into Apache Cassandra, you can specify a custom client-side timestamp with USING TIMESTAMP <xxx>, for finer control over the last-write-wins semantics. The examples in the docs are all of using timestamps in microseconds, just generated clientside.

Is there any good reason I can't instead use a generated number that's nowhere near a realistic microsecond timestamp (eg some output of a logical clock that fits in an int64), if it would be convenient for my application to get last-write-wins semantics based off of that number?

I've tried this in cqlsh and everything seems to be working fine. (In particular, TTL expiry seems to work as normal, which was the thing that I thought might be broken by my using unrealistic timestamps). But I'm worried I might be missing something, eg if cassandra uses the timestamps as part of the compaction strategy or something (any use other than picking a winner given conflicting cells for the same primary key).

Upvotes: 0

Answers (2)

Erick Ramirez

Reputation: 16353

There is no real reason other than Cassandra expects the write-time to be a timestamp with microsecond-precision.

You are right in that you can supply some notional integer value as the write-time but the server will still evaluate it as microseconds. There's a danger that if you supply a value that is equivalent to a timestamp too far into the future, the data will not get removed by a simple DELETE statement since it will have a "normal" timestamp which would appear older than the INSERT or UPDATE.

Interestingly, I can't think of a use case where using notional/fictional/arbitrarily chosen write-time is useful for an application. In particular, this caught my attention:

... it would be convenient for my application to get last-write-wins semantics ...

It doesn't make sense to me why you would want to do that in your application. It assumes that your application somehow can retrieve all versions of a cell/column/row/partition from a table when Cassandra will only return the latest version (not any of the older versions) in the result set.

If you have some sort of versioning use case where you want to store multiple versions of data that mutates over time, maybe arbitrary write-time is not what you want. Maybe you just need to model the data differently.

For example, to keep track of the different mobile numbers a person has had then I would model the data clustered by date or time when it changed. For example:

CREATE TABLE mobiles_by_username (
    username text,
    changedate date,
    mobile int
    ...
    PRIMARY KEY(username, changedate)
) WITH CLUSTERING ORDER BY (changedate DESC)

With this schema, each user (partitioned by username) has multiple entries of mobile phone numbers they've had, sorted by the latest to oldest. It is storing "versions" of their numbers with the corresponding date of when they changed. Cheers!

Upvotes: -1

Andrew

Reputation: 27294

My initial thought is that using any 'custom clock' for the timestamps - which is what this is - would render a lot of additional tooling problematic at best, non-functional at worst, since they would not obey / support the custom clock.

The TTL is calculated at insertion time based on the TTL vs current time - so not surprising it is not affected.

Upvotes: 1

Is there any reason the Cassandra LWW &#39;timestamp&#39; has to be a real timestamp?

Answers (2)

Related Questions

Is there any reason the Cassandra LWW 'timestamp' has to be a real timestamp?