Reputation: 233
I have Cassandra cluster with 2 node. And my table structure is <key, Map<list, timestamp>>
. I am trying to fetch all key that contains given list. My query look like
Statement select = QueryBuilder.select().all().from(tableName).where(QueryBuilder.containsKey("list", value)); select.setFetchSize(50000);
but i am getting cassandra timeout during read query.
I can decrease setFetchSize but it taking too much time to process 19 million row.
Cassandra version = Cassandra 2.2.1
Upvotes: 1
Views: 1032
Reputation: 8812
Cassandra data modeling best practices recommend not to use collections (list, set, map) to store a massive amount of data. The reason is that when loading the CQL row (SELECT ... WHERE id=xxx) Cassandra server has to load the entire collection in memory.
Now to answer your questions:
- Can any one please suggest correct way to solve this problem?
Using secondary index to retrieve a huge data set (19 millions) isn't the best approach for your problem.
If your requirement is: give me all list which contains an item, the following schemas may be more appropriate
Solution 1: manual denormalization
CREATE TABLE base_table(
id text,
key int,
value timestamp,
PRIMARY KEY(id, key)
);
CREATE TABLE denormalized_table_for_searching(
key int,
id text
value timestamp,
PRIMARY KEY(key, id));
// Give me all couples (id,value) where key = xxx
// Use iterator to fetch data by page and not load 19 millions row at once !!
SELECT * FROM denormalized_table_for_searching WHERE key=xxx;
Solution 2: automatic denormalization with Cassandra 3.0 materialized views
CREATE TABLE base_table(
id text,
key int,
value timestamp,
PRIMARY KEY(id, key)
);
CREATE MATERIALIZED VIEW denormalized_table_for_searching
AS SELECT * FROM base_table
WHERE id IS NOT NULL AND key IS NOT NULL
PRIMARY KEY(key, id);
// Give me all couples (id,value) where key = xxx
// Use iterator to fetch data by page and not load 19 millions row at once !!
SELECT * FROM denormalized_table_for_searching WHERE key=xxx;
- is there any alternative available for this kind of problem?
See answer for point 1. above :)
Upvotes: 1