Remi-C
Remi-C

Reputation: 11

plpgsql expert : (set of record) input and output for a function

I need some help with plpgsql function in postgres. This question has been asked on psql-general without conclusive answer (here and here)

This boils down to this :

how to create a plpgsql function with a set of rows as input and a set of rows as output without using array (for performance).

I currently transmit input as the ref of a cursor but it is not satisfactory because it forces to compute twice the same query (SELECT ...hard_work...) AND it messes transactions in my application.

Currently it works like :

DECLARE cursor FOR ...hardwork...;

WITH first_query AS (
SELECT ...hard_work... --second computation of hard_work
   ),
second_query AS ( 
    SELECT ...another_query_using_hard_work...
   )
SELECT *
FROM my_function('cursor'::refcursor) f(...) ;

Ultimately I would like to have something like (not working)

WITH first_query AS ( 
    SELECT ...hard_work...
   ),
second_query AS ( 
    SELECT ...another_query_using_the_hard_work_query...
   )
SELECT *
FROM my_function('first_query'::regclass) f(...) ;

Of course SELECT ...hard_work... is expensive (around 50ms) and it would be better not to compute it twice. The application is data streaming, so time is precious, and data are heavy, so copying data in temp table may be worse than computing twice (typically tens of MB ).

Other solutions were proposed

I would be very grateful to have an advanced insight on the subject.

Best Regards,

Rémi-C

PS : link to questions on psql-general mailing list contains a lots of details about specific and even all the code.

PPS : version : postgres 9.2 . OS : Ubuntu 12.04 LTE, client : PGAdmin3 for test, node.js for prod.

Upvotes: 1

Views: 1068

Answers (2)

Chris Travers
Chris Travers

Reputation: 26464

I think the typical way to solve some thing like this would be with a custom aggregate/function. Then you can use whatever internal storage you want. Each row gets passed in directly, transformed, and then processed later.

Upvotes: 0

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657962

As far as I can see, you have not tried to use temporary tables, yet.

You can do this with a hardcoded name or even use dynamic names, possibly utilizing a SEQUENCE to get unique table names. You would then use dynamic SQL with EXECUTE inside a PL/pgSQL function and pass the table name (or even better: the object identifier type regclass) to it.

Be sure to run ANALYZE manually on bigger temporary tables right after major changes (usually after initially filling it), since temporary tables are not visible to the autovacuum daemon.
Possibly even create indices on big temp tables!

You can find many code example here on Stackoverflow. One particularly rich and loosely related answer:
Refactor a PL/pgSQL function to return the output of various SELECT queries

Upvotes: 1

Related Questions