Reputation: 419

Cassandra 4.1 read timeout for a simple use case

I am running into read timeout problems which I would not expect.

3 nodes on commodity hardware and on my home network, replication factor 2
one simple keystore and table:

CREATE KEYSPACE IF NOT EXISTS uzzstore WITH REPLICATION = {
    'class' : 'SimpleStrategy',
    'replication_factor' : '2'
};

CREATE TABLE IF NOT EXISTS uzzstore.chunks(
    id blob primary key,
    size bigint
);

very simple query: SELECT * FROM chunks WHERE id IN (... up to 50 ids ...)
the id is 32 bytes hash
one single client using datastax java driver which synchronously runs the query above; think about it like you scan another data source and search the ids in cassandra; this is the big workload that runs for hours
about 600M records in chunks

What I see is that sometimes I get a Query timed out after PT2S, which is a bit unexpected to me given such a simple use case (ie simple select on PK and no concurrency at all). I can not observe any node particularly loaded or caped in memory.

In debug log, I see messages like:

SELECT * FROM uzzstore.chunks \
  WHERE id = 0xa9d6f7de939aa2ff41a88011717c41c1d369beb314b10ab62f3f09c1cf840864 \
  LIMIT 4935
  ALLOW FILTERING>, \
time 547 msec - slow timeout 500 msec/cross-node

Is this normal? is there anything I should optimize to prevent it? am I doing anything wrong? Of course my point is not to increase timeout unless there are good reasons to do it, but to understand how to scale the system to support the use case.

PS: it looks like cassandra turns SELECT ... WHERE id IN(...) into SELECT ... WHERE id = ? ALLOW FILTERING - shall I use directly the latter in my code? does ALLOW FILTERING impact performance?

Upvotes: 1

Answers (2)

Erick Ramirez

Reputation: 16393

The behaviour you're seeing is expected because your query requires a scatter-gather operation and it is working by design.

When using the IN() operator on partition keys, the coordinator has to fire off multiple requests one for each key, wait for responses from the replicas, combine the results then return the result set to the client. This is a very expensive operation because it has to retrieve the data from several replicas which are distributed across the cluster.

This is incorrect usage of the IN() operator. In my experience, 2 or 3 keys would be fine but with anything more and the performance becomes very unpredictable.

The recommended usage of the IN() operator is to filter rows that belong to a single partition, for example:

SELECT ... FROM ...
  WHERE partition_key = ?
  AND clustering_column IN(...)

As a side note, the debug message you posted looks to me like it was generated by o.a.c.db.monitoring.MonitoringTask.java so I would ignore the reference to ALLOW FILTERING. MonitoringTask aggregates all the slow queries in the last 5000ms (default).

The aggregated operation in the log isn't the actual query that ran -- it's a "summarised" version of the queries. In any case, when the filter on a query is the partition key, ALLOW FILTERING doesn't have any effect so it's nothing to be concerned about.

The real issue it seems is that the data was not modelled correctly so you need to retrieve multiple partitions in order to satisfy the application query. Cassandra is designed for very fast reads and writes at internet scale but in order to achieve that, the data model has to match the application query.

You need to redesign the table so that the data needed by the app query is clustered in a single partition. Cheers!

Upvotes: 1

Aaron

Reputation: 57808

So I would not use the IN clause with 50 entries. In fact, I would not go beyond single digits.

When faced with running a multi-key query, Cassandra cannot determine which node is responsible for all of the data requested. So it picks one (called a "coordinator"). That one node handles (both) communication with the other nodes and builds the result set. If it runs out of available resources to do one or more of those tasks, it times-out.

Instead of running 1 query for 50 rows, try running 50 queries for 1 row.

Also, with 3 nodes I would build my keyspaces with RF=3. Some operations default to QUORUM consistency (if not specified), and quorum of 2 is...well, 2.

Upvotes: 2

Cassandra 4.1 read timeout for a simple use case

Answers (2)

Related Questions