Reputation: 1501
I have a few tables that are related to each other, it looks something like this:
organizations:
- id
- name
- ... other fields
users:
- id
- name
- organization_id
- organization_name
- ... other fields
I keep organization_name
field in the users
table so that it doesn't have to look up to organization to get the organization name
The problem is that if organization name is changed, all users related to the organization must be updated to reflect the new name. In my real scenario there are more tables where I store organization_name
on.
Problem: Currently I just fire up the update statement asynchronously and if it fails halfway then I'll end up with inconsistent data
Question: Is there a best practice how to deal with this sort of issue?
Possible solutions:
BATCH
statement. But I found it very limiting since by default it only allows 50kb query size (in my case 1 update might lead up to updating 8,000 other entities from two or three different tables with varying length in the field values - so query size is rather unpredictable)
BATCH
statement to update 100 items (out of 600 that needs to be updated) and it failed with "Batch Size Too Large" exception...PS - my rows are not too wide, at most I have about 20 columns per table
Update:
Forgot to add, this is a webapp where update needs to be reflected as soon as possible, so batch job won't be applicable
Update 2:
Regarding read pattern, my current example is oversimplified, but in any case I would require to fetch list of users (it can be from multiple organizations) - this might return over thousands of users over hundreds of organizations which is why I stored organization_name
in the users
table as my understanding is that with Cassandra data denormalization is the way to go
Upvotes: 2
Views: 1479
Reputation: 2466
Try to work with paging. Most drivers support it.
1) Receive the results for update from users table, with paging of x row in each page.
2) Run async update for each record in the page.
3) Move to next page.
Upvotes: 2
Reputation: 5180
Like in every long-running update process, you should use the concept of bookmark:
In the event of a crash, you will just resume where you crashed by reading your bookmark.
To perform such task you must already know what records you have to update, but I'm assuming you already know them or know how to retrieve that information.
Upvotes: 3