visingh
visingh

Reputation: 233

Cassandra timeout during read query (19 million result) at consistency ONE

I have Cassandra cluster with 2 node. And my table structure is <key, Map<list, timestamp>>. I am trying to fetch all key that contains given list. My query look like

Statement select = QueryBuilder.select().all().from(tableName).where(QueryBuilder.containsKey("list", value)); select.setFetchSize(50000);

but i am getting cassandra timeout during read query.

I can decrease setFetchSize but it taking too much time to process 19 million row.

  1. Can any one please suggest correct way to solve this problem?
  2. is there any alternative available for this kind of problem?

Cassandra version = Cassandra 2.2.1

Upvotes: 1

Views: 1032

Answers (1)

doanduyhai
doanduyhai

Reputation: 8812

Cassandra data modeling best practices recommend not to use collections (list, set, map) to store a massive amount of data. The reason is that when loading the CQL row (SELECT ... WHERE id=xxx) Cassandra server has to load the entire collection in memory.

Now to answer your questions:

  1. Can any one please suggest correct way to solve this problem?

Using secondary index to retrieve a huge data set (19 millions) isn't the best approach for your problem.

If your requirement is: give me all list which contains an item, the following schemas may be more appropriate

Solution 1: manual denormalization

CREATE TABLE base_table(
   id text,
   key int,
   value timestamp,
   PRIMARY KEY(id, key)
);

CREATE TABLE denormalized_table_for_searching(
  key int,
  id text
  value timestamp,
  PRIMARY KEY(key, id));

// Give me all couples (id,value) where key = xxx
// Use iterator to fetch data by page and not load 19 millions row at once !!
SELECT * FROM denormalized_table_for_searching WHERE key=xxx; 

Solution 2: automatic denormalization with Cassandra 3.0 materialized views

CREATE TABLE base_table(
   id text,
   key int,
   value timestamp,
   PRIMARY KEY(id, key)
);

CREATE MATERIALIZED VIEW denormalized_table_for_searching
AS SELECT * FROM base_table
WHERE id IS NOT NULL AND key IS NOT NULL
PRIMARY KEY(key, id);

// Give me all couples (id,value) where key = xxx
// Use iterator to fetch data by page and not load 19 millions row at once !!
SELECT * FROM denormalized_table_for_searching WHERE key=xxx; 
  1. is there any alternative available for this kind of problem?

See answer for point 1. above :)

Upvotes: 1

Related Questions