Reputation: 173
I am planning to migrate data from my existing database (Postgres) to Cassandra. Here is a brief overview of the system:
user_id
, event_name
, timestamp
I am trying to model this data using few different approaches.
timestamp_year
, timestamp_month
are added. timestamp
is used a cluster key.
I have tried using weekly buckets instead of monthly buckets and pagination to improve on other parameters. But this is something i am not able to sort out Partition size is non-uniform because of different rate of data from 3 different sources.
How can i keep partition size consistent (almost) in such a data model? Ideas are welcome.
Upvotes: 1
Views: 207
Reputation: 882
This is a classical problem and there are no easy solutions to make partition size uniform. If you can predict the rate of ingestion per user, probably you can have different buckets of users, such as, high, medium and low ingestion users.
Depending on the type, the time bucket would be different. For a high ingestion user, partition means a day and for a low ingestion user, partition means a month.
For speeding up your month query on a high ingestion user, you can run parallel queries of 30 days and see if it helps to optimize your query time.
Upvotes: 1