Hibernate / Postgresql - Protocol version 3/2 prepared statement /parameters

Question

Issuing the following hibernate hql to my Postgres db (9.2.3) with protocol version 3:

select  count(*)  , obj1.type , (obj1.creationTime + :p1 )  
from fr.xxx.WorkflowPojo obj1 
group by obj1.type  , (obj1.creationTime + :p1 )  
order by  (obj1.creationTime + :p1 ) asc

Getting : ERROR: column "workflow.creation_time" must appear in the GROUP BY clause or be used in an aggregate function

While i get no exception with protocol version 2

In pgAdmin the following request is ok so i guess it's not a PgSql problem:

PREPARE test2(varchar) as 
select  count(*)  , 
workflow.optype, (workflow.creation_time + $1::integer )
from workflow
group by workflow.optype, (workflow.creation_time + $1::integer )
order by  (workflow.creation_time + $1::integer ) asc

Anyone has an idea how to solve this problem if i want to use PostgreSQL protocol version 3?

edit : Thanks Craig for helping. This is what I get from the postgreSQL perspective:

VERSION3 :

    STATEMENT:  select count(*) as col_0_0_, workflowdd0_.optype as col_1_0_,       workflowdd0_.creation_time+$1 as col_2_0_ 
    from workflow_ddc workflowdd0_ 
    group by workflowdd0_.optype , workflowdd0_.creation_time+$2 
    order by workflowdd0_.creation_time+$3

VERSION 2 :

    LOG:  statement: select count(*) as col_0_0_, workflowdd0_.optype as col_1_0_, workflowdd0_.creation_time+3600 as col_2_0_ 
    from workflow_ddc workflowdd0_ 
    group by workflowdd0_.optype , workflowdd0_.creation_time+3600 
    order by workflowdd0_.creation_time+3600 asc

from the hibernate perspective :

    Hibernate: select count(*) as col_0_0_, workflowdd0_.optype as col_1_0_, workflowdd0_.creation_time+? as col_2_0_ 
    from workflow_ddc workflowdd0_ 
    group by workflowdd0_.optype , workflowdd0_.creation_time+? 
    order by workflowdd0_.creation_time+? asc

With version 3 server-side prepared statement, it replaces $1 everywhere by $1 $2 $3 $4 ... With vesion 2, it replaces string client-side.

Maybe it's a jdbc driver bug ? it should stay $1 everywhere imho...

Thanks Christophe

Craig Ringer · Accepted Answer

This looks like a JDBC API limitation combined with PostgreSQL being fairly strict about GROUP BY clauses.

The crucial difference is that your manual PgAdmin test uses a single parameter and uses it twice in the query. By contrast, the Hibernate query passes the value twice as two separate parameters. At PREPARE time PostgreSQL cannot prove that $1 will always equal $2 even though in practice you know they will, so PostgreSQL refuses to plan the query.

Demo of the problem

Demo setup:

CREATE TABLE somedemo( x integer, y integer );
INSERT INTO somedemo(x,y) SELECT a,a from generate_series(1,15) a;

Demo 1, text substitution, works fine:

SELECT x, (y+1) FROM somedemo GROUP BY x, y+1;

Demo 2, single parameter, works fine because Pg can prove that (y+$1) in one place is always equal to (y+$1) in another place:

PREPARE preptest1(integer) AS select x, (y+$1) from somedemo GROUP BY x, y+$1;
EXECUTE preptest1(1);

Demo 3, two parameters. Fails because Pg can't prove that (y+$1) is equal to (y+$2) at PREPARE time:

regress=> PREPARE preptest2(integer,integer) AS SELECT x, (y+$1) FROM somedemo GROUP BY x, y+$2;
ERROR:  column "somedemo.y" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: PREPARE preptest2(integer,integer) AS SELECT x, (y+$1) FROM ...
                                                         ^

This works when you force protocol level 2 because the JDBC driver substitutes the parameters server-side.

How other languages handle this

In Python+psycopg2 or other languages with more sophisticated database drivers I'd handle this with named or positional parameters:

$ python
Python 2.7.3 (default, Aug  9 2012, 17:23:57) 
>>> import psycopg2
>>> conn = psycopg2.connect('')
>>> curs = conn.cursor()
>>> curs.execute("SELECT x, (y+%(parm)s) FROM somedemo GROUP BY x, y+%(parm)s", { 'parm': 1 })
>>> curs.fetchall()
[(15, 16), (3, 4), (12, 13), (14, 15), (10, 11), (11, 12), (8, 9), (5, 6), (13, 14), (1, 2), (2, 3), (4, 5), (7, 8), (9, 10), (6, 7)]
>>>

Unfortunately, it looks like JDBC only supports named parameters in CallableStatement; yet again we see the pain of Java's legacy cruft coming to bite us.

Why fixing it isn't simple

To handle this server side PostgreSQL would have to delay planning these statements until it got the parameters and then execute it as a regular ad-hoc query. There's no support for doing this at this point, though some of the groundwork has been laid with the introduction of prepared statement replanning.

It isn't clear how we'd handle it transparently on the JDBC driver side. Even if we delayed sending the prepared statement until we got the 1st set of parameters we wouldn't know that "$1" will always be equal to "$2" (and can be combined) just because they're equal for the first execution...

Hibernate can't fix this; it knows that :p1 is the same parameter in all three places but it has no way to tell PostgreSQL that through the limitations of the JDBC positional parameter interface. It could substitute all parameters into the query text but that's almost always the wrong thing to do, this is a rather unusual corner-case.

The only solid fix I see for this is for PgJDBC to extend JDBC with named or ordinal parameters, like ?:p1 or ?:1. Then Hibernate's PostgreSQL dialect could be extended to support them. To avoid compatibility problems it'd be necessary to require a connection parameter to be set in order to enable the extended parameter syntax. This all seems pretty painful, so I'd prefer to wait until the JDBC spec adds real named parameter support (ie: don't hold your breath, your grandchildren might live to see it happen) or just work around the problem.

Workarounds

I suspect the best option is to use a subquery to generate a virtual table with the generated value, then group by it in the outer query. The SQL to do this would look like:

SELECT x, y_plus FROM (
  SELECT x, (y+?) FROM somedemo
) temptable(x,y_plus)
GROUP BY x, y_plus;

Only one reference to the parameter is required with this phrasing. Translating it to HQL is left as an exercise for the reader ;-) .

PostgreSQL's query optimiser will generally turn this into a plan that's just as efficient as the simple string-substituted form, as seen here:

regress=> PREPARE preptest5(integer) AS SELECT x, y_plus FROM (SELECT x, (y+$1) FROM somedemo) temptable(x,y_plus) GROUP BY x, y_plus;
regress=> explain EXECUTE preptest5(1);

                          QUERY PLAN                           
---------------------------------------------------------------
 HashAggregate  (cost=1.26..1.45 rows=15 width=8)
   ->  Seq Scan on somedemo  (cost=0.00..1.19 rows=15 width=8)
(2 rows)

regress=> explain SELECT x, y+1 FROM somedemo GROUP BY x, y+1;
                          QUERY PLAN                           
---------------------------------------------------------------
 HashAggregate  (cost=1.26..1.45 rows=15 width=8)
   ->  Seq Scan on somedemo  (cost=0.00..1.19 rows=15 width=8)
(2 rows)

For a non-performance-critical ad-hoc or infrequently used function you could work around this by writing a native query that passes the parameter once in a CTE VALUES clause, like:

PREPARE preptest3(integer) AS 
WITH params(a) AS (VALUES($1)) 
SELECT x, (y+a) FROM somedemo CROSS JOIN params GROUP BY x, y+a;

EXECUTE preptest3(1);

Needless to say this is clumsy and may not perform particularly wonderfully, but it works in cases where you have to refer to the parameter in many different contexts.

If you can't use the subquery table approach set out earlier from HQL, a better alternative to the hacky CTE is to wrap the query up in an SQL function and call the function from JDBC, eg:

-- Define this in your database schema or run it on app startup:
CREATE OR REPLACE FUNCTION test4(integer) RETURNS TABLE (x integer, y integer) AS $$
    SELECT x, (y+$1) FROM somedemo GROUP BY x, y+$1;
$$ LANGUAGE sql;

-- then in JDBC prepare a simple "SELECT * FROM test4(?)", resulting in:

PREPARE preptest4(integer) AS SELECT * FROM test4($1);

EXECUTE preptest4(1);

Hibernate / Postgresql - Protocol version 3/2 prepared statement /parameters

Answers (2)

Demo of the problem

How other languages handle this

Why fixing it isn't simple

Workarounds

Related Questions