Zeruno
Zeruno

Reputation: 1639

Achieving window function-like behavior using a PostgreSQL user defined function?

Let's say that given a table observations_tbl with attributes date (day) and value, I want to produce the new attribute prev_day_value to get the following table:

|---------------------|-------|----------------|
|     date            | value | prev_day_value |
|---------------------|-------|----------------|
| 01.01.2015 00:00:00 | 5     | 0              |
| 02.01.2015 00:00:00 | 4     | 5              |
| 03.01.2015 00:00:00 | 3     | 4              |
| 04.01.2015 00:00:00 | 2     | 3              |
|---------------------|-------|----------------|

I am well-aware that such an output can typically be obtained using a WINDOW function. But how would I achieve this through a PostgreSQL user defined function? I want to indicate that I am in a situation where I must use a function, difficult to explain why without going into detail - these are the restrictions I have and if anything, it is a technical challenge.

Take into consideration this template query:

SELECT *, lag(value,1) AS prev_day_value -- or lag(record,1) or lag(date,value,1) or lag(date,1) or lag(observations_tbl,1), etc.
FROM observations_tbl

I am using function lag with parameter 1 to look for a value which comes before the current row by 1 - a distance of 1 row. I don't care what other parameters the function lag can have (table name, other attributes) - what could the function lag look like to achieve such functionality? The function can be of any language, SQL, PL/pgSQL and even C using PostgreSQL API/backend.

I understand that one answer can be wrapping a WINDOW query inside lag user defined function. But I am thinking that would be a rather costly operation if I have to scan the entire table twice (once inside the lag function and once outside). I was thinking that maybe each PostgreSQL record would have a pointer to its previous record which is directly accessible? Or that I can somehow open a cursor at this specific row / row number without having to scan the entire table? Or is what I am asking impossible?

Upvotes: 0

Views: 274

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656804

What Pavel posted, just with fewer assignments. Should be faster:

CREATE OR REPLACE FUNCTION report()
  RETURNS TABLE(d date, v int, prev_v int) AS
$func$
BEGIN
   prev_v := 0;
   FOR d, v IN
      SELECT date, value FROM observations_tbl ORDER BY 1
   LOOP
      RETURN NEXT;
      prev_v := v;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

The general idea can pay if it actually replaces multiple scans over the table with a single one. Like here:

Upvotes: 1

Pavel Stehule
Pavel Stehule

Reputation: 45805

Your request is not possible to solve with relational tools (window functions are not relational extension in SQL). In C language you can write own alternative of function lag. You can do same work in PL8 language (Javascript). Unfortunately the API for window functions doesn't exist for PL/pgSQL. You cannot to write simple PL/pgSQL function that has access to different row than is processed.

The one possible alternative (but with some performance risk) is writing table function. There you have a control over all processed dataset, and you can do this operation simply.

CREATE OR REPLACE FUNCTION report()
RETURNS TABLE(d date, v int, prev_v int) $$
DECLARE r RECORD;
BEGIN
  prev_v := 0;
  FOR r IN SELECT date, value FROM observations_tbl t ORDER BY 1
  LOOP
    d := r.date; v := r.value;
    RETURN NEXT;
    prev_v := v;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

There is not any other alternative usable solution. In very old date these values was calculated with correlated selfjoins, but this solution has pretty terrible performance.

Upvotes: 2

Related Questions