djechlin
djechlin

Reputation: 60748

Minimizing inconsistency between tables in denormalized databases like Cassandra

Cassandra (and BigTable, etc) recommends a denormalized database, where tables are designed from the expected queries. The Cassandra doc uses this example:

hotels_by_poi:   poi_name (Key)
                 hotel_id (Cluster key)
                 name
                 phone
                 address

hotels:          hotel_id (Key)
                 name
                 phone
                 address

So name, phone, and address are denormalized between hotels_by_poi and hotels. What I'm wondering about is how to implement this method:

update_hotel_info(hotel_id, name, phone, address) {
    updateHotel(hotel_id, name, phone, address);
    updatePoisByHotel(hotel_id, name, phone, address);

}

It's possible the first method errors, or that the server running the two methods errors between the first and second update methods. Therefore the data gets out of sync. Without doing anything else, it's not even eventually consistent.

Upvotes: 1

Views: 287

Answers (2)

Manish Khandelwal
Manish Khandelwal

Reputation: 2310

As @Erick mentioned either use batch for maintaining consistency or if you can handle at client side by retrying failed inserts/deletes. For example

update_hotel_info(hotel_id, name, phone, address) {
    updateHotel(hotel_id, name, phone, address);
    updatePoisByHotel(hotel_id, name, phone, address);
}

You can retry update_hotel_info if either of the insert/update failed . This way you will get fast writes and you can use cheap writes of Cassandra.

Upvotes: 0

Erick Ramirez
Erick Ramirez

Reputation: 16293

The idea is to wrap the related table updates in a CQL BATCH statement as I've explained here -- https://community.datastax.com/articles/2744/.

Even if you didn't use CQL batches, the idea is that if either of those methods fail, you should have error handling that would (for example) retry the request to make sure they're successful. Cheers!

Upvotes: 2

Related Questions