cassandra read performance jumps at certain rows

Question

Trying to find out why a cassandra read is taking so long, I used tracing and limited the number of rows. Strangely, when I query 600 rows, I get results in ~50 milliseconds. But 610 rows takes nearly 1 second!

cqlsh> select containerdefinitionid from containerdefinition limit 600;
... lots of output ...

Tracing session: 6b506cd0-83bc-11e3-96e8-e182571757d7

 activity                                                                                        | timestamp    | source        | source_elapsed
-------------------------------------------------------------------------------------------------+--------------+---------------+----------------
                                                                              execute_cql3_query | 15:25:02,878 | 130.4.147.116 |              0
                                                                               Parsing statement | 15:25:02,878 | 130.4.147.116 |             39
                                                                              Peparing statement | 15:25:02,878 | 130.4.147.116 |            101
                                                                   Determining replicas to query | 15:25:02,878 | 130.4.147.116 |            152
 Executing seq scan across 1 sstables for [min(-9223372036854775808), min(-9223372036854775808)] | 15:25:02,879 | 130.4.147.116 |           1021
                                                                Scanned 755 rows and matched 755 | 15:25:02,933 | 130.4.147.116 |          55169
                                                                                Request complete | 15:25:02,934 | 130.4.147.116 |          56300
cqlsh> select containerdefinitionid from containerdefinition limit 610;
... just about the same output and trace info, except...

                                                            Scanned 766 rows and matched 766 | 15:25:58,908 | 130.4.147.116 |         739141

There seems to be nothing unusual about the data in those particular rows: - values are similar to those before and after. - using the COPY command I can export the whole table and import on a different cluster and performance is fine. - these rows are the first example, but there seem to be other places where query time jumps as well. Whole table is only ~3000 rows but takes ~15sec to list all primary keys.

There does seem to be something unusual about the data STORAGE: - snapshot copied to another cluster and imported gives same results with same limits - COPY data to CSV and then into another cluster does not, performance is great

Have tried compaction, repair, reindex, cleanup and refresh. No effect.

I realize I could "fix" by copying data out and in, but I'm trying to figure out what is going on here to avoid it happening in production on a table too big to fix with COPY.

Table has 17 columns, 3 indices, TEXT primary key, two LIST columns and two TIMESTAMP columns; the rest are TEXT. Can reproduce issue with both SimpleStrategy and DC-aware replication. Can reproduce with 4 copies of data on 4 servers, 2 copies on 2 servers and 1 copy on 2 servers (so doesn't matter if query is performed locally or involves multiple servers). Cassandra-1.2 with cqlsh.

Any ideas? Suggestions?

cassandra read performance jumps at certain rows

Answers (1)

Related Questions