Reputation: 677
I'm currently struggling querying be chunk of data that is stored in partitioned table (partition per date)
the data looks like that:
date, product_id, orders
2019-11-01, 1, 100
2019-11-01, 2, 200
2019-11-02, 1, 300
I have hundreds of date-partitions and millions of rows per date.
Now, if I want to query, for instance, total orders for product id 1 and 2 for period of 2 weeks, and group by date (to show in a graph per date), the db has to go to 2 weeks of partitions and fetch the data for them.
That process might be taking a long time when the number of products is big or the time frame required is long.
I have read that AWS Redshift is suitable for this kind of tasks. I'm considering shifting my partitioned tables (aggregated analytics per date) to that technology but I wonder if that's really what I should do to make those queries to run much faster.
Thanks!
Upvotes: 6
Views: 380
Reputation: 1265
Considering your case Amazon Redshift can be a good choice, so does Amazon Athena. But it is also important to consider your application framework. Are you opt moving to Amazon only for Database or you have other Amazon services in the list too?
Also before making the decision please check the cost of Redshift.
Upvotes: 0
Reputation: 269410
If you are struggling with traditional SQL databases, then Amazon Redshift is certainly an option. It can handle tables with billions of rows.
This would involve loading the data from Amazon S3 into Redshift. This will allow Redshift to optimize the way that the data is stored, making it much faster for querying.
Alternatively, you could consider using Amazon Athena, which can query the data directly from Amazon S3. It understands data that is partitioned into separate directories (eg based on date).
Upvotes: 3
Reputation: 436
As per your use case Redshift is really a good choice for you. To gain the best performance out of Redshift, it is very important to set proper distribution and sort key. In your case "date" column should be distribution key and "productid" should be sort key. Another important note, Do not encode "date" and "productid" column. You should get better performance.
Upvotes: 4
Reputation: 311
Which version of PostgreSQL are you using?
Are you using native partioning or inheritance partitioning trigger-based
?
Latest version of postgresql improved partitioning management.
Upvotes: 1