Reputation: 357
I have a use case where data needs to be dumped into DB, that is not having any uniqueness. Say some random data, that can have repeated values, generated at very high speed.
Now Cassandra has constraint of having partition key per table mandatory. Even though I can introduce a TimeUUID column, but again problem comes while retrieving. That again can be handled using ALLOW FILTER in Select clause.
I am looking for some better approach. Anyone can suggest some other approach. Only constraint is I can only dump data in Cassandra DB, File system not available.
Upvotes: 1
Views: 49
Reputation: 2996
It seems like you just want to store your data without knowing yet how to query it. With Cassandra, you typically need to know how to query it before you design your data model. If you want to retrieve the full data set, you will have poor performance. You might want to consider hdfs instead.
If you really need to store in Cassandra, try to think of a way to store it that makes sense. For example, you could store your data in timebucket. Try to size your bucket to store about 1MB worth of data. If you produce 1MB of data per minute, then a minute bucket is appropriate. You would have a partition key as the minute of the date, then a clustering column as timeUUID, then the rest of your data to store.
Upvotes: 1