Reputation: 21
I have table -
CREATE TABLE IF NOT EXISTS Chat(
id UUID,
time timestamp,
idSender UUID,
message varchar,
PRIMARY KEY ((id),time))
WITH CLUSTERING ORDER BY(time DESC);
And I want to sort my messages by time. But Cassandra doesn't sort my table, when I run:
select * from chat order by time
It shows the message
Error from server: code=2200 [Invalid query]
message="ORDER BY is only supported when the
partition key is restricted by an EQ or an IN.
How I can sort my table by time?
Upvotes: 2
Views: 4867
Reputation: 57748
To be clear, Cassandra uses clustering keys (time
in your case) to enforce on-disk sort order. But it can only enforce this order within a partition key.
So this should work:
SELECT * FROM chat WHERE id = db14789e-ede0-4852-a397-d0ccc7d8349e;
Note, that you do not need to specify the ORDER BY
, as you have done that already in your table definition. Unless of course you want to flip the sort direction (ASCending vs. DESCending).
If that is not helpful, then you'll need to rebuild your table with a different partition key, one which it does makes sense to partition your data by. Then queries by that partition key will return sorted by time
. But in Cassandra, you cannot simply select all rows in a table, and expect it to be primarily sorted by anything other than the hashed token value of the partition key.
I don't use it because id is unique
Pro-tip: If you're not going to query by it, then it doesn't make sense to use it as your partitioning key.
need to get last 100 messages
So probably the best way to go about that, is to use a partitioning technique known as "bucketing." In this case, we could (for example) create a bucket by day, which would look something like this:
CREATE TABLE IF NOT EXISTS chat_by_day(
day TEXT,
id UUID,
time timestamp,
idSender UUID,
message TEXT,
PRIMARY KEY ((day),time,id))
WITH CLUSTERING ORDER BY(time DESC,id ASC);
After inserting some data, I can then query by day, and see the results sorted by time:
cassdba@cqlsh:stackoverflow> SELECT time,message FROM chat_by_day
WHERE day='20180621' LIMIT 100;
time | message
---------------------------------+--------------------------------------------------------
2018-06-21 14:51:14.863000+0000 | No...I am your father.
2018-06-21 14:51:14.858000+0000 | If only you could see what I have seen with your eyes.
2018-06-21 14:51:14.854000+0000 | Game over man!
2018-06-21 14:50:13.369000+0000 | There can be only one.
(4 rows)
Now this was just an example, but hopefully it points you in the right direction.
It is strange that it is impossible sort by time
Well, that is a design consideration going back to how Cassandra was built. Distributed databases may still have all replicas on one node (like Neo4j), but sharded databases (by definition) do not. And in a large system, you definitely do not want a "hot" node (one node with more data than another), so it makes more sense to build it to re-order the data in a way which facilitates even distribution. With that in mind, it becomes easier to see why overall data order/sorting quickly becomes a secondary consideration.
Upvotes: 5