Apache Cassandra schema design with JSON

Question

Suppose we have CF with user information:

{
   123 => { first_name => Nick, last_name => Schiff, age => 23, city = NY }
}

Suppose also we do not search by column names, we use the information just to display the data. Column names also are not updated individuality very often. (e.g. change of first_name)

May be in this case a single encoded JSON is better idea:

{
   123 => { data = [json], city = NY }
}

and leaving "city", because let say we will update it often.

The pros of JSON are:

easy denormalization - you copy just one column - e.g. "data".
you do not need to know column names, so you do not need to slice() before delete.
emulate Super column without composite keys - this is bit like (1)

The cons I can see:

no validation of JSON values
cassandra do not know the stored values.

Is someone works like this? Is there something I am missing here?

psanford · Accepted Answer

This is could be a reasonable strategy depending on your usage model. The biggest downside to storing data in a blob format is how you handle concurrent updates. Say you have 2 processes, one trying to update the first_name field, the other trying to update the age field. Each process will have to read the row to get the current blob then update the field that is to change and write it back to Cassandra. When all your data is stored in one blob, the second writer will essentially undo the changes of the first.

If these were stored as separate columns there would be no update conflict.

But perhaps your records are immutable in which case this concurrent update issue would not be a problem.

Apache Cassandra schema design with JSON

Answers (1)

Related Questions