GantengX
GantengX

Reputation: 1501

Best practice to update large number of rows in Cassandra reliably (relational update)

I have a few tables that are related to each other, it looks something like this:

organizations: 
- id
- name
- ... other fields

users:
- id
- name
- organization_id
- organization_name
- ... other fields

I keep organization_name field in the users table so that it doesn't have to look up to organization to get the organization name

The problem is that if organization name is changed, all users related to the organization must be updated to reflect the new name. In my real scenario there are more tables where I store organization_name on.

Problem: Currently I just fire up the update statement asynchronously and if it fails halfway then I'll end up with inconsistent data

Question: Is there a best practice how to deal with this sort of issue?

Possible solutions:

PS - my rows are not too wide, at most I have about 20 columns per table


Update:

Forgot to add, this is a webapp where update needs to be reflected as soon as possible, so batch job won't be applicable


Update 2:

Regarding read pattern, my current example is oversimplified, but in any case I would require to fetch list of users (it can be from multiple organizations) - this might return over thousands of users over hundreds of organizations which is why I stored organization_name in the users table as my understanding is that with Cassandra data denormalization is the way to go

Upvotes: 2

Views: 1479

Answers (2)

nevsv
nevsv

Reputation: 2466

Try to work with paging. Most drivers support it.

1) Receive the results for update from users table, with paging of x row in each page.

2) Run async update for each record in the page.

3) Move to next page.

Upvotes: 2

xmas79
xmas79

Reputation: 5180

Like in every long-running update process, you should use the concept of bookmark:

  • Run jobs of (say 100) async updates and then store somewhere that you just done updating 100 rows.
  • Run another job of another 100 rows and then bookmark you've just updated 200 rows.
  • And so on...

In the event of a crash, you will just resume where you crashed by reading your bookmark.

To perform such task you must already know what records you have to update, but I'm assuming you already know them or know how to retrieve that information.

Upvotes: 3

Related Questions