Reputation: 646
I am trying to redesigning Pg database to gain more performance. Db is for ERP IS and it holds larger amount of date (four years). Every year was in separate database, which was a bad solution (building reports was pain in the a??), so I consolidated all four db's into one... but... some tables are just to large! In order to gain some performance I decided to divide data in tables. I have 2 ways to do this.
First: dividing tables into "arch_table" and "working_table" and using views for reporting.
or
Second: using partitioning (say separate partition for every year).
So, my question is which way is better ? Partitioning or some archiving system ?
Upvotes: 2
Views: 5232
Reputation:
PostgreSQL's partitioning is, effectively, a bunch of views that use a check constraint to verify that only correct data is in each partition. A parent table is created and additional partitions are created that inherit from the master:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
CREATE TABLE measurement_y2006m02 ( ) INHERITS (measurement);
CREATE TABLE measurement_y2006m03 ( ) INHERITS (measurement);
...
CREATE TABLE measurement_y2007m11 ( ) INHERITS (measurement);
CREATE TABLE measurement_y2007m12 ( ) INHERITS (measurement);
CREATE TABLE measurement_y2008m01 ( ) INHERITS (measurement);
Obviously, I've omitted a bit of code, but you can check out the documentation on PostgreSQL table partitioning. The most important part of partitioning is to make sure you build automatic scripts to create new partitions into the future as well as merge old partitions.
Operationally, when PostgreSQL goes to run your query it looks at SELECT * FROM measurement WHERE logdate BETWEEN '2006-02-13' AND '2006-02-22';
the optimizer goes "AH HA! I know what's up here, there's a partition. I'll just look at table measurement_y2006m02
and pull back the appropriate data."
As you age data out of the main partitions, you can either just drop the old tables or else merge them into an archive partition. Much of this work can be automated through scripting - all you really need to do is write the scripts once and test it. A side benefit is that older data tends to not change - many partitions will require no index maintenance or vacuuming.
Keep in mind that partitioning is largely a data management solution and may not provide the performance benefit that you're looking for. Tuning queries, applying indexes, and examining the PostgreSQL configuration (postgresql.conf, storage configuration, and OS configuration) may lead to far bigger performance gains that partitioning your data.
Upvotes: 10
Reputation: 896
You should use partitioning with any of those ways. It's exactly what you need.
Upvotes: 2