Reputation: 376
Forgive me for asking something that is probably explained elsewhere, but I am having trouble designing a data model in Cassandra.
I am storing transactions. These transactions each have a source (user), a timestamp, and some associated keywords. I need to be able to find transactions given the source and a date range and (optional) keywords. Cassandra is attractive because I need to store billions of transactions.
I have been unable to find a resource that explains how to do this type of thing. My initial thoughts involve having a few CFs - a transaction CF, a keyword_transaction CF, a source_transaction CF, and possible a day_transaction CF (or something similar). This would make it very straight forward to find transactions based on any one of the above items, but it doesn't seem like it will let me search on all of the above items.
Any thoughts?
Upvotes: 2
Views: 517
Reputation: 5064
Start by thinking your query and then to your data model. Read here and here as this help when you plan for your data model.
cf : transactions
rowkey : source/uuid (suggestion)
cn : source
cv : UTF8
cn : keyword
cv : UTF8
cn : date
cv : DateType
cn : time
cv : DateType
cf : keywords
rowkey : keyword
cn : source
cv : UTF8
where you will have a standard column family called transactions and a few column name (cn) and its corresponding column value (cv). Each of these transaction are identify by the rowkey. Another standard column family is the keywords where the rowkey would be the keyword.
You can search by source, timestamp or keyword but you do need to index them for the query to work. For example with the above suggestion data structure, you can do these:
get transactions where source = ''
get transactions where source = '' and date > '';
get transactions where date = '';
get keywords['keyword'];
Upvotes: 3