Reputation: 115
I am attemping to use the following CQL3 statement to update a column family 50k times:
update column_family
set value_1 = ?,
value_2 = ?,
value_3 = ?,
value_4 = ?
where partition_key = ?
and column_key = ?;
The important piece to state here is that the partition_key is the same for all 50k records.
I either send cassandra this query 50k times, or batch up 5000 at a time using BATCH ... APPLY BATCH; Either way, it takes roughly 10 minutes with no network latency to speak of. I know that the internal structure is one wide row. Is this why it is slow?
Also do I have the internal structure correct? If the CF creation CQL looks like this:
create table column_family (
partition_key varchar,
column_key uuid,
value_1 int,
value_2 timestamp,
value_3 double,
value_4 double,
PRIMARY KEY(partition_key , column_key)
);
Then my internal CF would have partition_key as a partition key, the column keys would be column_key(0)#value_1, column_key(0)#value_2, column_key(0)#value_3, column_key(0)#value_4, coulmn_key(1)#value_1 .......
Upvotes: 1
Views: 1115
Reputation: 1462
To rule out a performance issue on server side, you should check out the write latency..
You can check it using datastax' ops center. If not, try the cli tool that comes with C*.
nodetool cfhistograms keyspacename cfname
Upvotes: 1
Reputation: 132862
50K inserts in 10 minutes is 12ms per insert on average. While that is slow, it's not extremely slow. If you have a slow network, if you do work between requests, and send only one request at a time then I can totally see it being normal. It sounds unlikely that Cassandra is your bottleneck. Try parallelizing your client or try a driver that supports CQL3 pipelining (like the Datastax driver).
Upvotes: 3