Reputation: 101
In order to improve the scalability of my database, I would like to have multiple instances of PostgreSQL distributed over multiple machines. (Horizontal sharding)
Most of the operations executed on my database are INSERTs - there are a few occasional SELECTs as well, however the main emphasis is still on the INSERTs and thus the database is growing huge quickly.
For example: imagine we have 3 tables named user
, product
and activity(user_id, product_id, activity(viewed, bought...), timestamp)
.
Folks here on the business side use pgAdmin to query the (for now) unique database. What would be a great tool that would allow to have the SQL statements be distributed over the cluster and come back with a reduced set of results without having to to use the UNION
keyword in our SELECT queries?
For example, consider the following SQL SELECT statement:
select u.name FROM user AS u INNER JOIN activity AS a ON u.id = a.id INNER JOIN
product AS p ON p.id = a.id WHERE product.name='blabla';
Ideally nothing would change for the folks using pgAdmin, but under the hood the request would get chopped into smaller "pieces" and distributed to all of the servers in the cluster.
While searching the web, I came across Skype's PL/Proxy but is this project still active? (Same question for Postgres-XC.)
Any help will be much appreciated, thank you.
Upvotes: 0
Views: 1517
Reputation: 30372
Take a look at pgpool http://www.pgpool.net/docs/latest/pgpool-en.html
Specifically at parallel query. "Parallel Query Using the parallel query feature, data can be split among multiple servers, so that a query can be executed on all the servers concurrently, reducing the overall execution time. Parallel query works best when searching large-scale data."
Upvotes: 2