richie-torres
richie-torres

Reputation: 756

What is the best way to distribute postgresql

I have a database in postgresql for a software as service with hundreds of customers, currently have a schema of postgresql for each customer, but i like a best solution because the customers rapidly increase. I read about cassandra but i don't wanna lose the integrity of primary,foregin keys and checks. Also read about postgresql in distributed systems, but i dont know what is the best way for implement this currently

Upvotes: 10

Views: 11174

Answers (2)

jbellis
jbellis

Reputation: 19377

i don't wanna lose the integrity of primary,foregin keys and checks

The point of systems like Cassandra is, once your dataset or workload doesn't fit on a single machine, you have to give up those things even if you stay on postgresql. (I covered the details in a talk that I highly recommend: http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-what-every-developer-should-know-about-database-scalability-21-3280648).

So Cassandra is an answer to the question, "If we know we're going to have to give up foreign keys and joins, what can we build by rethinking how we design our database?"

If you never get to that point, Cassandra is overkill. (But you should still watch that talk. :)

Upvotes: 7

kgrittn
kgrittn

Reputation: 19511

There are four levels at which you can separate your customers:

  1. Run a separate PostgreSQL cluster for each customer. This provides maximum separation; each client is on a separate port with its own set of system tables, transaction log, etc.

  2. Put each customer in a separate database in the same cluster. This way they each have a separate login, but on the same port number, and they share global tables like pg_database.

  3. Give each customer a separate schema in the same database. This doesn't require separate user IDs if they are only connecting through your software, because you can just set the search_path. Of course you can use separate user IDs if you prefer.

  4. Make customer_id part of the primary key of each table, and be sure to limit by that in your software. This is likely to scale better than having duplicate tables for each of hundreds of users, but you must be very careful to always qualify your queries by customer_id.

Some people have been known to combine these techniques, for example, limiting each cluster to 100 databases with a separate database for each customer.

Without more detail it's hard to know which configuration will be best for your situation, except to say that if you want to allow users direct access to the database, without going through your software, you need to think about what is visible in system tables with each option. Look at pg_database, pg_user, and pg_class from a user perspective, to see what is exposed.

Upvotes: 11

Related Questions