Now that cross database queries are available within one cluster, I'm trying to understand if/when it's worth creating separate databases. Seems like having everything in one database is the best solution in most cases. I haven't found details on cross-database queries but I would think performance would be slower than if it were on one database. My question is, when should I set up multiple databases instead of putting everything in one database. https://aws.amazon.com/about-aws/whats-new/2020/10/announcing-cross-database-queries-for-amazon-redshift-preview/ Limitations - https://docs.aws.amazon.com/redshift/latest/dg/cross-database_limitation.html We have 4 different "products" or ecommerce sites. Related but different. We had thought about having these in separate databases within the same cluster but we are seeing too many downsides to this approach.

Reputation: 6065

Is it better to use one database or multiple databases?

Now that cross database queries are available within one cluster, I'm trying to understand if/when it's worth creating separate databases. Seems like having everything in one database is the best solution in most cases.

I haven't found details on cross-database queries but I would think performance would be slower than if it were on one database.

My question is, when should I set up multiple databases instead of putting everything in one database.

We have 4 different "products" or ecommerce sites. Related but different. We had thought about having these in separate databases within the same cluster but we are seeing too many downsides to this approach.

Upvotes: 1

Answers (2)

Bill Weiner

Reputation: 11032

Like others I only see very limited cases where multiple databases in a Redshift cluster is an advantage. The hardware is shared between these databases so performance demands on one database impact others. These can show up as CPU impacts, disk IO impacts, network impacts, memory impacts, leader node over subscription, or temp space impacts. It can be challenging enough to manage Redshift for a single database at time, why complicate things.

If the data is basically independent then why not set up multiple clusters? Sharing limited data can be handled in other ways. This increases the amount of horsepower you can apply and provide much better hardware isolation.

If there is high levels of data sharing then "databases" just provides one more level of data organization within effectively a single database. Not a bad thing but not really a multiple database use model. Just 2 layers of "schema"s.

There is some additional data protection / isolation / access control which may be a driver in some use cases.

Some may have many small databases that have very light workloads so combining them into one cluster could be a cost savings. A lot of horsepower isn't needed so share it among many database use cases.

I can see some cases where multi-database Redshift could be an advantage but I don't see the driver to move to this model for most current Redshift customers.

Upvotes: 2

Lajos Arpad

Reputation: 76583

If you have separate projects/modules, each with its own data, then it makes sense to create separate databases for them. Example:

Let's assume that you have

project1
project2
...
project1000

each has 1 000 000 users on average, but they are not closely related. Actually it's better for your performance if you separate your data into separate databases in this case, so that whenever a user logs in to a project, the user is searched from 1 000 000 possible records instead of 1 000 x 1 000 000 possible records. The main question you need to ask yourself is: "are my datasets closely related"? If so, you might use a single database. If not, then separating them into different databases actually improves performance in theory instead of causing damage. Never optimize something that's not slow.

Upvotes: 3

Is it better to use one database or multiple databases?

Answers (2)

Related Questions