Cassandra Reads seem to slow

Question

I was profiling an application that uses Cassandra and it turned out that reads were the bottleneck. At closer inspection it seems they take way to long, I would really appreciate some help in understanding why.

The application reads always the whole set of rows for a given partition key (the query is of the form SELECT * FROM table WHERE partition_key = ?). Unsurprisingly, the read time is O(number of rows for partition key), however the constant, seems way to high. After examining the query plan it turns out that the majority of time is spent on the "merging data from mem and sstables".

This step takes over 200ms for a partition key of ~5000 rows, where a row consists of 9 columns, and is less than 100 bytes. Given the read throughput of a SSD, reading sequentially 0.5MB should happen instantaneously.

Actually, I doubt this is to do with I/O at all. The machine used to have a spinning disk which was replaced with the SSD it has now. The change had no impact on query performance. I think there is something very involved in Cassandra processing or how it reads the data of disk that makes this operation very expensive.

Merging from more than one SSTable or iterating over tombstoned cells does not explain this. First of all, it should take milliseconds, second of all this is happening consistently, regardless if it is 2 or 4 SSTables and whether there are or not tombstoned cells.

To give some background:

Hardware: The machine that is running Cassandra is an 8 core, bare metal and SSD backed. I query it from cqlsh on the machine, the data is stored locally. There is no other load on it and looking at iostats, there is also barely any i/o.

Data model: The partition key, PK, is of text type, the primary key is a composite of the partition key and a bigint column K, and the rest are 7 mutable columns. The schema creation command is listed below.

CREATE TABLE inboxes (
  PK text,
  K bigint,
  A boolean,
  B boolean,
  C boolean,
  D boolean,
  E bigint,
  E bigint,
  F int,
  PRIMARY KEY (PK, K)
  ) WITH CLUSTERING ORDER BY (K DESC));

This is an example trace, with 3 SSTable involved, an a quite large number of tombstones.

activity                                                                                  | timestamp    | source      | source_elapsed
-------------------------------------------------------------------------------------------+--------------+-------------+----------------
                                                                        execute_cql3_query | 03:14:07,507 | 10.161.4.77 |              0
                            Parsing select * from table where PK = 'key_value' LIMIT 10000;| 03:14:07,508 | 10.161.4.77 |            123
                                                                       Preparing statement | 03:14:07,508 | 10.161.4.77 |            244
                                                 Executing single-partition query on table | 03:14:07,509 | 10.161.4.77 |           1155
                                                              Acquiring sstable references | 03:14:07,509 | 10.161.4.77 |           1173
                                                               Merging memtable tombstones | 03:14:07,509 | 10.161.4.77 |           1195
                                                            Key cache hit for sstable 2906 | 03:14:07,509 | 10.161.4.77 |           1231
                                               Seeking to partition beginning in data file | 03:14:07,509 | 10.161.4.77 |           1240
                                                            Key cache hit for sstable 1533 | 03:14:07,509 | 10.161.4.77 |           1550
                                               Seeking to partition beginning in data file | 03:14:07,509 | 10.161.4.77 |           1561
                                                            Key cache hit for sstable 1316 | 03:14:07,509 | 10.161.4.77 |           1867
                                               Seeking to partition beginning in data file | 03:14:07,509 | 10.161.4.77 |           1878
                                                Merging data from memtables and 3 sstables | 03:14:07,510 | 10.161.4.77 |           2180
                                                  Read 5141 live and 1944 tombstoned cells | 03:14:07,646 | 10.161.4.77 |         138734
                                                                          Request complete | 03:14:07,742 | 10.161.4.77 |         235030

Cassandra Reads seem to slow

Answers (1)

Related Questions