RGPT
RGPT

Reputation: 574

Postgres partition by week

I can imagine table partition by a date (in particular for logs) is something widely used, but I am not able to find a good answer to my problem.

I want to create a table partition by week (the number of records is to big to make it monthly). The reason for it to be weekly is that I need the data for an algorithm that will be looking for dates as part of the process.

My problem is I want it to create the partitions considering the week and using the "typical" approach I would have to be creating it manually. Something like this.

    CREATE TABLE measurement_y2013w01 (
CHECK ( logdate >= DATE '2013-01-07' AND logdate < DATE '2013-01-14' ) 
    ) INHERITS (measurement);

    CREATE TABLE measurement_y2006w02 (
CHECK ( logdate >= DATE '2013-01-14' AND logdate < DATE '2013-01-21' ) 
    ) INHERITS (measurement);

... 

But I want it to be made automatically. I don't want to be creating a partition for every week one by one.

My rule for name giving would be yYYYYwWW for the naming of the partition or start datadYYYYMMDD.

I thought in checking for partitions when inserting using something like this:

 SELECT
    nmsp_parent.nspname AS parent_schema,
    parent.relname      AS parent,
    nmsp_child.nspname  AS child,
    child.relname       AS child_schema
FROM pg_inherits
    JOIN pg_class parent        ON pg_inherits.inhparent = parent.oid
    JOIN pg_class child         ON pg_inherits.inhrelid   = child.oid
    JOIN pg_namespace nmsp_parent   ON nmsp_parent.oid  = parent.relnamespace
    JOIN pg_namespace nmsp_child    ON nmsp_child.oid   = child.relnamespace

and if the partition is not there create it before the insert, but this seams a lot inefficient, considering the number of records inserted.

My other alternative is to have an external process running weekly creating this partition, but I was trying to avoid this.

Is there a more efficient solution for this that I am missing, that is used, for example, for monthly checks?

Upvotes: 9

Views: 11916

Answers (4)

yucer
yucer

Reputation: 5039

You can create one insertion trigger that creates the partition if it doesn't exists:

CREATE OR REPLACE FUNCTION trg_measurement_partition()
    RETURNS trigger AS
$func$
DECLARE
    _tbl text := to_char(NEW.logdate, '"measurement_y"IYYY_"w"IW');
    _rec_date date := NEW.logdate::date;
    _min_date date := date_trunc('week', NEW.logdate)::date;
    _max_date date := date_trunc('week', NEW.logdate)::date + 7;
    _min_live date := date_trunc('week', NEW.logdate)::date - 1;
BEGIN
IF NOT EXISTS (
    SELECT 1
    FROM   pg_catalog.pg_class c
    JOIN   pg_catalog.pg_namespace n ON n.oid = c.relnamespace
    WHERE  n.nspname = 'public'
    AND    c.relname = _tbl
    AND    c.relkind = 'r') THEN
    EXECUTE format('CREATE TABLE IF NOT EXISTS %I
                    (CHECK (logdate::date >= to_date(%L, ''yyyy-mm-dd'') AND
                            logdate::date <  to_date(%L, ''yyyy-mm-dd'')),
                        LIKE measurement INCLUDING INDEXES )
                        INHERITS (measurement)'
            , _tbl
            , to_char(_min_date, 'YYYY-MM-DD')
            , to_char(_max_date, 'YYYY-MM-DD')
            );
    IF (_rec_date >= _min_date) and (_rec_date < _max_date) THEN
        EXECUTE format('CREATE OR REPLACE VIEW v_measurement AS (
                            SELECT * FROM measurement
                            WHERE (logdate::date >= to_date(%L, ''yyyy-mm-dd'') AND
                                logdate::date <  to_date(%L, ''yyyy-mm-dd''))
                        )'
                , to_char(_min_live, 'YYYY-MM-DD')
                , to_char(_max_date, 'YYYY-MM-DD')
                );
    END IF;
END IF;
EXECUTE 'INSERT INTO ' || quote_ident(_tbl) || ' VALUES ($1.*)'
USING NEW;
RETURN NULL;
END
$func$ LANGUAGE plpgsql SET search_path = public;


DROP TRIGGER IF EXISTS ins_measurement on measurement;
CREATE TRIGGER ins_measurement
BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE PROCEDURE trg_measurement_partition();

The trigger performs the following tasks:

  1. Check if the table exists (querying pg_catalog is very fast).
  2. Calculate the name of the corresponding table for the records logdate
  3. In case the partition table with that name doesn't exist, create a new one for the corresponding week (CREATE TABLE ... LIKE measurement INCLUDING INDEXES INHERITS (measurement))
  4. Create or update a view called v_measurement for the data of the last week.

The answer assumes that an empty measurement table exists as template and that the logdate field is timestamp. In that table, it is better to create one index for the logdate field typecasted to date:

drop index if exists measurement_logdate_part_idx;

CREATE UNIQUE INDEX measurement_logdate_part_idx
ON measurement((logdate::date) DESC NULLS LAST, id);

The code was adapted from another similar stackoverflow response.

Upvotes: 1

Max Murphy
Max Murphy

Reputation: 1973

In case it helps, I wrote a postgres trigger to create a table that is automatically partitioned by day. Creating inherited tables happens automatically. To partition by week you would have to change the day->string mapping and that's it.

https://github.com/bitdivine/pg_day_partitions

Upvotes: 0

Lucas Leite
Lucas Leite

Reputation: 467

Ok, so let's create ourselves a function to handle that!

CREATE OR REPLACE FUNCTION create_partition_measurement( DATE, DATE )
returns void AS $$
DECLARE
create_query text;
BEGIN
    FOR create_query IN SELECT

        'CREATE TABLE measurement_' || TO_CHAR( d, 'YYYY_WW' ) || ' (
        CHECK ( EXTRACT(YEAR FROM logdate) = EXTRACT(YEAR FROM TIMESTAMP ''' || d || ''') AND EXTRACT(WEEK FROM logdate) = EXTRACT(WEEK FROM TIMESTAMP ''' || d || ''') )
        ) INHERITS (measurement);'

        FROM generate_series( $1, $2, '1 week' ) AS d LOOP

        EXECUTE create_query;

    END LOOP;
END;
$$
language plpgsql;

With this you can now call something like

SELECT create_partition_measurement ('2015/02/08','2015/03/01'); 

and have your partitions created. First step to automation, done.

I tested all this in my own database using the following test table:

CREATE TABLE measurement (id INT NOT NULL PRIMARY KEY, id_user INT NOT NULL, logdate TIMESTAMP NOT NULL);

After creating the partitions using the function above, I was able to:

  • Insert data into the correct partition;
  • Get an error when trying to insert data from one week into another week's partition;
  • Automatically create partition for several weeks, and;
  • Get an error should I try to create a partition for a week where one already exists.

This should be enough =)

Now, about automating the creation process. I use a simple cron script to call this function for me every month and a couple of monitoring scripts to make sure everything is working as it should. The cron looks like this:

0 0 1 * * /var/lib/postgresql/create_partitions.sh

And the script will run the command using the current date and the current date + 1 month. It looks something like this:

startDate=`date "+%Y/%m/%d"`
endDate=`date -u -d "+1 month -$(date +%d) days" "+%Y/%m/%d"
psql -U "$dbUser" -w -c "SELECT create_partition_measurement('$startDate','$endDate');"

If you need to include indexes, PKs, FKs in the table, or help with the triggers to make all this work, just let me know.

Upvotes: 7

nad2000
nad2000

Reputation: 4905

You can use date_trunc function to round data value to the first day of the week. For partition naming you could use year and week number in the year YYWW:

CREATE TABLE measurement_1301 (
    CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-07') )
    INHERITS (measurement);

CREATE TABLE measurement_1302 (
    CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-14') )
    INHERITS (measurement);

CREATE TABLE measurement_1303 (
    CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-21') )
    INHERITS (measurement);

-- Default partition:
CREATE TABLE measurement_default () INHERITS (measurement);

For partition name generation use to_char( logdate::date, 'YYWW') and if you if you prefer yYYYYwWW: to_char( logdate::date, '"y"YYYY"w"WW')

And to check the existing partitions you could use very simple query:

SELECT relname FROM pg_class
WHERE relname ~ '^measurement_[0-9]{4}$'
ORDER BY RIGHT(relname,4) DESC

Data routing trigger inserts into appropriate partition and fall back to the default, if there is no partition for the given week.

CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN

    IF to_char( NEW.logdate::date, 'YYWW') = '1301' THEN
        INSERT INTO measurement_1301 VALUES (NEW.*);
    ELSIF to_char( NEW.logdate::date, 'YYWW') = '1302' THEN
        INSERT INTO measurement_1302 VALUES (NEW.*);
    ELSIF to_char( NEW.logdate::date, 'YYWW') = '1303' THEN
        INSERT INTO measurement_1303 VALUES (NEW.*);
    -- ...
    ELSE
      INSERT INTO measurement_default VALUES (NEW.*);
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER measurement_insert_tr BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger()

You will have create in advance all partitions. Or you can use the default partition and time to time re-partition the data stored there, creating new partitions and adjusting the insert trigger.

PS you can find scripts for partitioning solution based on triggers here http://hg.nowitworks.eu/postgresql-triggers-and-partitions

Upvotes: 2

Related Questions