Reputation: 600
I have the following problem: In my Cassandra database I have several messages sent by several users. My messages table has the following structure:
CREATE TABLE messages (
recipient bigint,
sender bigint,
created_at text,
content text,
PRIMARY KEY((recipient, sender),created_at)
);
I need to count the number of messages sent by a user in a day. For example, between the date 2017-01-01 and 2017-01-05
sender | created_at
1 2017-01-01
1 2017-01-01
2 2017-01-01
3 2017-01-02
3 2017-01-02
4 2017-01-03
4 2017-01-04
5 2017-01-04
I would have the result
2017-01-01 = 2
2017-01-02 = 1
2017-01-03 = 1
2017-01-04 = 2
Upvotes: 1
Views: 68
Reputation: 5180
From what I can see, you cannot do that with your table structure because your partition key contains the recipient
. To be told, you should not count at all, because counting keys in cassandra is hard.
However, if you insist on counting these keys, I suggest you two approaches:
CREATE TABLE counters_by_user (
sender bigint,
ts timestamp,
messages counter,
PRIMARY KEY (sender, ts)
)
This table will allow you to directly fetch the value you are looking for. It allows you to select the appropriate "granularity" of the counters, that is if you want a day-by-day counter simply store the timestamp in the ts
field in the yyyy-mm-dd
format. If you want hourly-based counting, store it in yyyy-mm-dd HH:00
format, etc... You'll need the exact sender
only to fetch the results, and can range query by specifying the ts
component of the primary key. Have a look at the Counters page documentation on how to use them, and beware that the main drawback of this approach is that Cassandra can over/under count, so watch your steps if you need to be pedantic on counting.
CREATE TABLE messages_by_sender (
sender bigint,
created_at timestamp,
PRIMARY KEY (sender, created_at)
);
Every time you insert a row in the messages
table you'll insert a row here, and when you need to count the messages sent you simply run a SELECT COUNT(*) FROM messages_by_sender WHERE sender=?
to count them all, or SELECT COUNT(*) FROM messages_by_sender WHERE sender=? AND created_at > ? AND created_at < ?;
to specify a range. If you have a lot of messages per sender
, however this will lead to inefficiency, since counting keys in Cassandra requires a partition scan.
Upvotes: 3
Reputation: 968
AFAIK, cassandra doesn't allow do SQL-like aggregations. You either need to pre-compute & keep into another database OR do the counting as the results are being scanned/queried.
Another option it to integrate cassandra with hadoop & do some map-reduce, but that seems like an overkill for what you're trying here.
Upvotes: 1