Ben Thorson
Ben Thorson

Reputation: 115

Cassandra CQL3 update slow performance on a single wide row

I am attemping to use the following CQL3 statement to update a column family 50k times:

 update column_family
 set    value_1    = ?,   
        value_2    = ?,   
        value_3    = ?,   
        value_4    = ?    
 where  partition_key = ?                
 and    column_key    = ?;     

The important piece to state here is that the partition_key is the same for all 50k records.

I either send cassandra this query 50k times, or batch up 5000 at a time using BATCH ... APPLY BATCH; Either way, it takes roughly 10 minutes with no network latency to speak of. I know that the internal structure is one wide row. Is this why it is slow?

Also do I have the internal structure correct? If the CF creation CQL looks like this:

create table column_family (
    partition_key varchar,
    column_key uuid,
    value_1 int,
    value_2 timestamp,
    value_3 double,
    value_4 double,
    PRIMARY KEY(partition_key , column_key)               
);

Then my internal CF would have partition_key as a partition key, the column keys would be column_key(0)#value_1, column_key(0)#value_2, column_key(0)#value_3, column_key(0)#value_4, coulmn_key(1)#value_1 .......

Upvotes: 1

Views: 1115

Answers (2)

John
John

Reputation: 1462

To rule out a performance issue on server side, you should check out the write latency..

You can check it using datastax' ops center. If not, try the cli tool that comes with C*.

nodetool cfhistograms keyspacename cfname

Upvotes: 1

Theo
Theo

Reputation: 132862

50K inserts in 10 minutes is 12ms per insert on average. While that is slow, it's not extremely slow. If you have a slow network, if you do work between requests, and send only one request at a time then I can totally see it being normal. It sounds unlikely that Cassandra is your bottleneck. Try parallelizing your client or try a driver that supports CQL3 pipelining (like the Datastax driver).

Upvotes: 3

Related Questions