How to change Cassandra indexes to allow query

Question

I'm using Cassandra to store pageviews in a very simple matter. However I'm unable to perform the queries I want due to indices which I set up wrongfully.

CREATE TABLE my_site.pageviews (
    url text,
    createdat timestamp,
    userid text,
    PRIMARY KEY ((url, createdat, userid))
)

I found out the hard way that I wasn't able to query, unless I specified all parts of the primary key in a query.

How do I need to configure my PRIMARY KEY to allow for these queries?:

SELECT * FROM pageviews WHERE url = ? AND createdat > ?

SELECT * FROM pageviews WHERE userid = ? AND createdat > ?

Any guidance would be greatly appreciated!

Patrick McFadin · Accepted Answer

For the types of lookups you will need to use two tables that will look like this:

CREATE TABLE my_site.pageviews_by_url (
    url text,
    createdat timestamp,
    userid text,
    PRIMARY KEY ((url), createdat, userid)
)

CREATE TABLE my_site.pageviews_by_userid (
    url text,
    createdat timestamp,
    userid text,
    PRIMARY KEY ((userid),createdat,url)
)

If you'll notice the subtle difference. Table "pageviews_by_url" has only 'url' as a partition key and the other columns as clustering keys. Table "pageviews_by_userid" is mostly the same table but with 'userid' as the partition key. On a SELECT query, using an = on the partition key and then the > on the first clustering column, will get you the results you are looking for when you do the following queries:

SELECT * FROM pageviews WHERE url = ? AND createdat > ?

SELECT * FROM pageviews WHERE userid = ? AND createdat > ?

When inserting data into these tables, it's a good case to use a BATCH statement. Insert to both tables at the same time.

How to change Cassandra indexes to allow query

Answers (1)

Related Questions