Reputation: 381
I am designing a database for a forum type application. I am thinking of using Cassandra (I am going to have a huge amount of data, so RDBMS might not be a viable option) but stuck at how to design the db. The main functionality would be search i.e. searching for keyword and all posts with that keyword will be returned. Search could also be by some characteristics of the user e.g reputation, geography etc.
What I have thought so far:
where col1 = keyword or col2 = keyword ...
But I think this is really a round-about way to go about it. Any better ideas?
Thanks!
Upvotes: 0
Views: 350
Reputation: 20021
To define a schema in Cassandra is very important to know what queries you're gonna do.
1 - A Primary Key made of X fields doesn't mean you'll be able to query for any field of the primary key. If you define
PRIMARY KEY(reputation, geography, category)
Using such a key you can not query for category unless you know both reputation and geography. You can not query for geography unless you know reputation. So, reading left-to-rigth, you can put fields in WHERE
conditions.
select * from abc where reputation = 'good';
select * from abc where reputation = 'good' and geography = 'usa';
select * from abc where reputation = 'good' and geography = 'usa' and category='pizza';
Any other combination is not allowed. To solve this problem in cassandra you should denormalize, putting your data into different tables. For instance 3 tables with 3 different primary keys
table info_by_category ... PRIMARY KEY(category, geography, reputation)
table info_by_reputation ... PRIMARY KEY(reputation, category, geography)
table info_by_geography PRIMARY KEY(geography, reputation, category)
Now you should choose which table to query based on the where conditions. If you have reputation and geography you should query table info_by_geography
, if you have geography and category you should go to info_by_category
2 - Cassandra is not suitable to perform full-text searches: to do the job you should use a full-text search engine (like solr/elasticsearch/whateveryouprefer) beside your db. More, even if you put many lookup indexes (formerly secondary idexes) you can not combine them with AND
operator. And many L.I. is a deprecated behaviour -- if you can't use a full-text search I'd rather use a Map, putting a secondary index on Map value. Said that, you still won't be able to search for different words using AND
operator.
3 - Can be easily done using a L.I.
HTH, Carlo
Upvotes: 1
Reputation: 6495
Remember, just coz you have lots of clustering columns doesn't mean you can search on them. If your primary key is (p, c1, c2, c3...) then to search on c2, you must provide c1. To search on c3, you must provide c1 and c2...
Secondary indexes are good for looking up data inside a partition. If your query is cross partition, you'll run into annoying hassles.
Cassandra works best when you have an idea about your queries. You create your physical data model according to those queries. Searching for everything on everything is something that should be left to search solutions. Identify your queries and keep that info in cassandra. Use Solr or ElasticSearch to do your arbitrary searches.
Upvotes: 0