Reputation: 198
I have a table like this.
> CREATE TABLE docyard.documents (
> document_id text,
> namespace text,
> version_id text,
> created_at timestamp,
> path text,
> attributes map<text, text>
> PRIMARY KEY (document_id, namespace, version_id, created_at) ) WITH CLUSTERING ORDER BY (namespace ASC, version_id ASC, created_at
> ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
I want to be able to do the range queries on following conditions-
select * from documents where namespace = 'something' and created_at> 'some-value' order by created_at allow filtering;
select from documents where namespace = 'something' and path = 'something' and created_at> 'some-value' order by created_at allow filtering;
I am not able to make these queries work in any manner. Tried secondary indexes as well. Can anyone please help?
I keep getting some or the other when trying to make it work.
Upvotes: 1
Views: 625
Reputation: 57798
First of all, don't use secondary indexes or ALLOW FILTERING
. With timeseries data that will perform terribly over time.
To satisfy your first query, you will want to restructure your PRIMARY KEY and CLUSTERING ORDER like this:
PRIMARY KEY (namespace, created_at, document_id) )
WITH CLUSTERING ORDER BY (created_at DESC, document_id ASC);
This will allow for the following:
namespace
.created_at
in DESCending order (most-recent rows read first).document_id
ALLOW FILTERING
or ORDER BY
in your query, as the necessary keys will be provided, and the results will already be sorted to your CLUSTERING ORDER.For your second query, I would create an additional query table. This is because in Cassandra, you need to model your tables to suit your queries. You may end-up having several query tables for the same data, and that's ok.
CREATE TABLE docyardbypath.documents (
document_id text,
namespace text,
version_id text,
created_at timestamp,
path text,
attributes map<text, text>
PRIMARY KEY ((namespace, path), created_at, document_id) )
WITH CLUSTERING ORDER BY (created_at DESC, document_id ASC);
This will:
namespace
and path
.namespace
and path
to be sorted according to your CLUSTERING ORDER.ALLOW FILTERING
or ORDER BY
in your query.Upvotes: 3
Reputation: 3760
I think you need to review how data modeling works in Cassandra.
The first query can look like this:
select * from documents where namespace = 'something' and created_at > 'some_formatted_date' and document_id='someid' and version_id='some_version' order by namespace, version_id, created_at allow filtering;
When querying a Cassandra table, you must:
select
Order by
following the clustering orderFixing the second query is straightforward. What are you trying to do? Cassandra is optimized for write performance. You may want to write this data into multiple tables for each group of queries you plan to run.
Upvotes: 1