yogk
yogk

Reputation: 381

Cassandra data model for forum-type application

I am designing a database for a forum type application. I am thinking of using Cassandra (I am going to have a huge amount of data, so RDBMS might not be a viable option) but stuck at how to design the db. The main functionality would be search i.e. searching for keyword and all posts with that keyword will be returned. Search could also be by some characteristics of the user e.g reputation, geography etc.

What I have thought so far:

  1. Primary key: Composite of date and all other searchable features e.g. geography, reputation etc. except the words in the post so that all other features can be easily searched
  2. Secondary indices: Put all words in separate columns and make all of them secondary index therefore to search for "keyword" I could do where col1 = keyword or col2 = keyword ...
  3. All posts with the column name = the second the message was posted (or column names = post IDs)

But I think this is really a round-about way to go about it. Any better ideas?

Thanks!

Upvotes: 0

Views: 350

Answers (2)

Carlo Bertuccini
Carlo Bertuccini

Reputation: 20021

To define a schema in Cassandra is very important to know what queries you're gonna do.

1 - A Primary Key made of X fields doesn't mean you'll be able to query for any field of the primary key. If you define

PRIMARY KEY(reputation, geography, category)

Using such a key you can not query for category unless you know both reputation and geography. You can not query for geography unless you know reputation. So, reading left-to-rigth, you can put fields in WHERE conditions.

select * from abc where reputation = 'good';
select * from abc where reputation = 'good' and geography = 'usa';
select * from abc where reputation = 'good'  and geography = 'usa' and category='pizza';

Any other combination is not allowed. To solve this problem in cassandra you should denormalize, putting your data into different tables. For instance 3 tables with 3 different primary keys

table info_by_category ... PRIMARY KEY(category, geography, reputation)
table info_by_reputation ... PRIMARY KEY(reputation, category, geography)
table info_by_geography PRIMARY KEY(geography, reputation, category)

Now you should choose which table to query based on the where conditions. If you have reputation and geography you should query table info_by_geography, if you have geography and category you should go to info_by_category

2 - Cassandra is not suitable to perform full-text searches: to do the job you should use a full-text search engine (like solr/elasticsearch/whateveryouprefer) beside your db. More, even if you put many lookup indexes (formerly secondary idexes) you can not combine them with AND operator. And many L.I. is a deprecated behaviour -- if you can't use a full-text search I'd rather use a Map, putting a secondary index on Map value. Said that, you still won't be able to search for different words using AND operator.

3 - Can be easily done using a L.I.

HTH, Carlo

Upvotes: 1

ashic
ashic

Reputation: 6495

  1. Remember, just coz you have lots of clustering columns doesn't mean you can search on them. If your primary key is (p, c1, c2, c3...) then to search on c2, you must provide c1. To search on c3, you must provide c1 and c2...

  2. Secondary indexes are good for looking up data inside a partition. If your query is cross partition, you'll run into annoying hassles.

Cassandra works best when you have an idea about your queries. You create your physical data model according to those queries. Searching for everything on everything is something that should be left to search solutions. Identify your queries and keep that info in cassandra. Use Solr or ElasticSearch to do your arbitrary searches.

Upvotes: 0

Related Questions