Create Cassandra CQL with IN and ORDER BY

Question

I need a CQL to get all rows from the table based on set of current user friends (I'm using IN for that) and sort them based on created date.

I'm trying to play with key and clustering key, but got no ideas.

Here is my Cassandra table:

CREATE TABLE chat.news_feed(
  id_news_feed                           uuid,
  id_user_sent         uuid,
  first_name text,
  last_name text,
  security int,
  news_feed text, 
  image blob,
  image_preview text,
  image_name text,
  image_length int,
  image_resolution text,
  is_image int,
  created_date        timestamp,
  PRIMARY KEY ((id_news_feed, id_user_sent), created_date))
    WITH CLUSTERING ORDER BY (created_date DESC) AND comment = 'List of all news feed by link id';

and here is my CQL (formed in Java):

SELECT JSON id_news_feed,  first_name, last_name, id_user_sent, news_feed, image_name, image_preview, image_length, created_date, is_image, image_resolution FROM chat.news_feed WHERE id_user_sent in (b3306e3f-1f1d-4a87-8a64-e22d46148316,b3306e3f-1f1d-4a87-8a64-e22d46148316) ALLOW FILTERING;

I coul not run it cause there is no key in my WHERE part of CQL.

Is there any way how I could get all rows created by set of users with Order By (I tried to create table different ways, but no results yet)?

Thank you!

Revaz · Accepted Answer

Unlike the relational databases here you will probably need denormalization of the tables. First of all, you cannot effectively query everything from a single table. Also Cassandra does not support joins natively. I suggest to split up your table into several.

Let's start with the friends: the current user id should be part of the primary key and the friends should go as a clustering column.

CREATE TABLE chat.user_friends (
  user_id uuid,
  friend_id uuid,
  first_name text,
  last_name text,
  security int,
  PRIMARY KEY ((user_id), friend_id));

Now you can find the friend for each particular user by querying as follows:

SELECT * FROM chat.user_friends WHERE user_id = 'a001-...';

or

SELECT * FROM chat.user_friends WHERE user_id = 'a001-...' and friend_id in ('a121-...', 'a156-...', 'a344-...');

Next let's take care of news feed: before putting remaining columns into this table I'd think about the desired query against this table. The news feeds needs to be filtered by the user ids with IN listing and at the same time be sortable by time. So we put the created_date timestamp as a clustering key and friends' user_id as a partitioning key. Note that the timestamps will be sorted per user_id not globally (you can re-sort those on the client side). What's really important is to keep news_feed_id out of the primary key. This column still may contain uuid which is unique, but as long as we don't want to query this table to get a particular news feed by id. For this purpose We'd anyway require separate table (denormalization of the data) or materialized view (which I will not cover in this answer but is quite nice solution for some types of denormalization introduced in Cassandra 3.0).

Here is the updated table:

CREATE TABLE chat.news_feed(      
  id_user_sent uuid,
  first_name text,
  last_name text,
  security int,
  id_news_feed uuid,
  news_feed text, 
  image blob,
  image_preview text,
  image_name text,
  image_length int,
  image_resolution text,
  is_image int,
  created_date        timestamp,
  PRIMARY KEY ((id_user_sent), created_date))
    WITH CLUSTERING ORDER BY (created_date DESC) AND comment = 'List of all news feed by link id';

Some example dataset:

cqlsh:ks_test> select * from news_feed ;

 id_user_sent                         | created_date                    | first_name | id_news_feed                         | image | image_length | image_name | image_preview | image_resolution | is_image | last_name | news_feed | security
--------------------------------------+---------------------------------+------------+--------------------------------------+-------+--------------+------------+---------------+------------------+----------+-----------+-----------+----------
 01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-23 00:00:00.000000+0000 |       null | fd25699c-78f1-4aee-913a-00263912fe18 |  null |         null |       null |          null |             null |     null |      null |      null |     null
 9bd23d16-3be3-4e27-9a47-075b92203006 | 2017-02-21 00:00:00.000000+0000 |       null | e5d394d3-b67f-4def-8f1e-df781130ea22 |  null |         null |       null |          null |             null |     null |      null |      null |     null
 6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-25 00:00:00.000000+0000 |       null | ec34c655-7251-4af8-9718-3475cad18b29 |  null |         null |       null |          null |             null |     null |      null |      null |     null
 6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-22 00:00:00.000000+0000 |       null | 5342bbad-0b55-4f44-a2e9-9f285d16868f |  null |         null |       null |          null |             null |     null |      null |      null |     null
 6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-20 00:00:00.000000+0000 |       null | beea0c24-f9d6-487c-a968-c9e088180e73 |  null |         null |       null |          null |             null |     null |      null |      null |     null
 63003200-91c0-47ba-9096-6ec1e35dc7a0 | 2017-02-21 00:00:00.000000+0000 |       null | a0fba627-d6a7-463c-a00c-dd0472ad10c5 |  null |         null |       null |          null |             null |     null |      null |      null |     null

And the filtered one:

cqlsh:ks_test> select * from news_feed where id_user_sent in (01b9b9e8-519c-4578-b747-77c8d9c4636b, 6e05257d-9278-4353-b580-711e62ade8d4) and created_date >= '2017-02-22';

 id_user_sent                         | created_date                    | first_name | id_news_feed                         | image | image_length | image_name | image_preview | image_resolution | is_image | last_name | news_feed | security
--------------------------------------+---------------------------------+------------+--------------------------------------+-------+--------------+------------+---------------+------------------+----------+-----------+-----------+----------
 01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-25 00:00:00.000000+0000 |       null | 26dc0952-0636-438f-8a26-6a3fef4fb808 |  null |         null |       null |          null |             null |     null |      null |      null |     null
 01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-23 00:00:00.000000+0000 |       null | fd25699c-78f1-4aee-913a-00263912fe18 |  null |         null |       null |          null |             null |     null |      null |      null |     null
 6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-25 00:00:00.000000+0000 |       null | ec34c655-7251-4af8-9718-3475cad18b29 |  null |         null |       null |          null |             null |     null |      null |      null |     null
 6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-22 00:00:00.000000+0000 |       null | 5342bbad-0b55-4f44-a2e9-9f285d16868f |  null |         null |       null |          null |             null |     null |      null |      null |     null

P.S. As you might notice we got rid of the ALLOW FILTERING clause. Don't use ALLOW FILTERING in any application as it has significant performance penalty. This is only usable to look up some small chunk of data scattered around in different partitions.

Create Cassandra CQL with IN and ORDER BY

Answers (1)

Related Questions