Reputation: 17804
I'm using Cassandra to store pageviews in a very simple matter. However I'm unable to perform the queries I want due to indices which I set up wrongfully.
CREATE TABLE my_site.pageviews (
url text,
createdat timestamp,
userid text,
PRIMARY KEY ((url, createdat, userid))
)
I found out the hard way that I wasn't able to query, unless I specified all parts of the primary key in a query.
How do I need to configure my PRIMARY KEY to allow for these queries?:
SELECT * FROM pageviews WHERE url = ? AND createdat > ?
SELECT * FROM pageviews WHERE userid = ? AND createdat > ?
Any guidance would be greatly appreciated!
Upvotes: 1
Views: 82
Reputation: 1351
For the types of lookups you will need to use two tables that will look like this:
CREATE TABLE my_site.pageviews_by_url (
url text,
createdat timestamp,
userid text,
PRIMARY KEY ((url), createdat, userid)
)
CREATE TABLE my_site.pageviews_by_userid (
url text,
createdat timestamp,
userid text,
PRIMARY KEY ((userid),createdat,url)
)
If you'll notice the subtle difference. Table "pageviews_by_url" has only 'url' as a partition key and the other columns as clustering keys. Table "pageviews_by_userid" is mostly the same table but with 'userid' as the partition key. On a SELECT query, using an = on the partition key and then the > on the first clustering column, will get you the results you are looking for when you do the following queries:
SELECT * FROM pageviews WHERE url = ? AND createdat > ?
SELECT * FROM pageviews WHERE userid = ? AND createdat > ?
When inserting data into these tables, it's a good case to use a BATCH statement. Insert to both tables at the same time.
Upvotes: 1