Reputation: 1639
Let's say that given a table observations_tbl
with attributes date
(day) and value
, I want to produce the new attribute prev_day_value
to get the following table:
|---------------------|-------|----------------|
| date | value | prev_day_value |
|---------------------|-------|----------------|
| 01.01.2015 00:00:00 | 5 | 0 |
| 02.01.2015 00:00:00 | 4 | 5 |
| 03.01.2015 00:00:00 | 3 | 4 |
| 04.01.2015 00:00:00 | 2 | 3 |
|---------------------|-------|----------------|
I am well-aware that such an output can typically be obtained using a WINDOW
function. But how would I achieve this through a PostgreSQL user defined function? I want to indicate that I am in a situation where I must use a function, difficult to explain why without going into detail - these are the restrictions I have and if anything, it is a technical challenge.
Take into consideration this template query:
SELECT *, lag(value,1) AS prev_day_value -- or lag(record,1) or lag(date,value,1) or lag(date,1) or lag(observations_tbl,1), etc.
FROM observations_tbl
I am using function lag
with parameter 1
to look for a value which comes before the current row by 1
- a distance of 1
row. I don't care what other parameters the function lag
can have (table name, other attributes) - what could the function lag
look like to achieve such functionality? The function can be of any language, SQL
, PL/pgSQL
and even C
using PostgreSQL API/backend.
I understand that one answer can be wrapping a WINDOW
query inside lag
user defined function. But I am thinking that would be a rather costly operation if I have to scan the entire table twice (once inside the lag
function and once outside). I was thinking that maybe each PostgreSQL record would have a pointer to its previous record which is directly accessible? Or that I can somehow open a cursor at this specific row / row number without having to scan the entire table? Or is what I am asking impossible?
Upvotes: 0
Views: 274
Reputation: 656804
What Pavel posted, just with fewer assignments. Should be faster:
CREATE OR REPLACE FUNCTION report()
RETURNS TABLE(d date, v int, prev_v int) AS
$func$
BEGIN
prev_v := 0;
FOR d, v IN
SELECT date, value FROM observations_tbl ORDER BY 1
LOOP
RETURN NEXT;
prev_v := v;
END LOOP;
END
$func$ LANGUAGE plpgsql;
The general idea can pay if it actually replaces multiple scans over the table with a single one. Like here:
Upvotes: 1
Reputation: 45805
Your request is not possible to solve with relational tools (window functions are not relational extension in SQL). In C language you can write own alternative of function lag. You can do same work in PL8 language (Javascript). Unfortunately the API for window functions doesn't exist for PL/pgSQL. You cannot to write simple PL/pgSQL function that has access to different row than is processed.
The one possible alternative (but with some performance risk) is writing table function. There you have a control over all processed dataset, and you can do this operation simply.
CREATE OR REPLACE FUNCTION report()
RETURNS TABLE(d date, v int, prev_v int) $$
DECLARE r RECORD;
BEGIN
prev_v := 0;
FOR r IN SELECT date, value FROM observations_tbl t ORDER BY 1
LOOP
d := r.date; v := r.value;
RETURN NEXT;
prev_v := v;
END LOOP;
END;
$$ LANGUAGE plpgsql;
There is not any other alternative usable solution. In very old date these values was calculated with correlated selfjoins, but this solution has pretty terrible performance.
Upvotes: 2