Reputation: 23
Issuing the following hibernate hql to my Postgres db (9.2.3) with protocol version 3:
select count(*) , obj1.type , (obj1.creationTime + :p1 )
from fr.xxx.WorkflowPojo obj1
group by obj1.type , (obj1.creationTime + :p1 )
order by (obj1.creationTime + :p1 ) asc
Getting : ERROR: column "workflow.creation_time" must appear in the GROUP BY clause or be used in an aggregate function
While i get no exception with protocol version 2
In pgAdmin the following request is ok so i guess it's not a PgSql problem:
PREPARE test2(varchar) as
select count(*) ,
workflow.optype, (workflow.creation_time + $1::integer )
from workflow
group by workflow.optype, (workflow.creation_time + $1::integer )
order by (workflow.creation_time + $1::integer ) asc
Anyone has an idea how to solve this problem if i want to use PostgreSQL protocol version 3?
edit : Thanks Craig for helping. This is what I get from the postgreSQL perspective:
VERSION3 :
STATEMENT: select count(*) as col_0_0_, workflowdd0_.optype as col_1_0_, workflowdd0_.creation_time+$1 as col_2_0_
from workflow_ddc workflowdd0_
group by workflowdd0_.optype , workflowdd0_.creation_time+$2
order by workflowdd0_.creation_time+$3
VERSION 2 :
LOG: statement: select count(*) as col_0_0_, workflowdd0_.optype as col_1_0_, workflowdd0_.creation_time+3600 as col_2_0_
from workflow_ddc workflowdd0_
group by workflowdd0_.optype , workflowdd0_.creation_time+3600
order by workflowdd0_.creation_time+3600 asc
from the hibernate perspective :
Hibernate: select count(*) as col_0_0_, workflowdd0_.optype as col_1_0_, workflowdd0_.creation_time+? as col_2_0_
from workflow_ddc workflowdd0_
group by workflowdd0_.optype , workflowdd0_.creation_time+?
order by workflowdd0_.creation_time+? asc
With version 3 server-side prepared statement, it replaces $1 everywhere by $1 $2 $3 $4 ... With vesion 2, it replaces string client-side.
Maybe it's a jdbc driver bug ? it should stay $1 everywhere imho...
Thanks Christophe
Upvotes: 2
Views: 1413
Reputation: 3
For the record, newest version of the postgresql dropped support for protocol V2. The parameter preferQueryMode can be used instead to force the jdbc driver to send values "inline", instead of named parameters
Upvotes: 0
Reputation: 324475
This looks like a JDBC API limitation combined with PostgreSQL being fairly strict about GROUP BY
clauses.
The crucial difference is that your manual PgAdmin test uses a single parameter and uses it twice in the query. By contrast, the Hibernate query passes the value twice as two separate parameters. At PREPARE
time PostgreSQL cannot prove that $1
will always equal $2
even though in practice you know they will, so PostgreSQL refuses to plan the query.
Demo setup:
CREATE TABLE somedemo( x integer, y integer );
INSERT INTO somedemo(x,y) SELECT a,a from generate_series(1,15) a;
Demo 1, text substitution, works fine:
SELECT x, (y+1) FROM somedemo GROUP BY x, y+1;
Demo 2, single parameter, works fine because Pg can prove that (y+$1)
in one place is always equal to (y+$1)
in another place:
PREPARE preptest1(integer) AS select x, (y+$1) from somedemo GROUP BY x, y+$1;
EXECUTE preptest1(1);
Demo 3, two parameters. Fails because Pg can't prove that (y+$1)
is equal to (y+$2)
at PREPARE
time:
regress=> PREPARE preptest2(integer,integer) AS SELECT x, (y+$1) FROM somedemo GROUP BY x, y+$2;
ERROR: column "somedemo.y" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: PREPARE preptest2(integer,integer) AS SELECT x, (y+$1) FROM ...
^
This works when you force protocol level 2 because the JDBC driver substitutes the parameters server-side.
In Python+psycopg2 or other languages with more sophisticated database drivers I'd handle this with named or positional parameters:
$ python
Python 2.7.3 (default, Aug 9 2012, 17:23:57)
>>> import psycopg2
>>> conn = psycopg2.connect('')
>>> curs = conn.cursor()
>>> curs.execute("SELECT x, (y+%(parm)s) FROM somedemo GROUP BY x, y+%(parm)s", { 'parm': 1 })
>>> curs.fetchall()
[(15, 16), (3, 4), (12, 13), (14, 15), (10, 11), (11, 12), (8, 9), (5, 6), (13, 14), (1, 2), (2, 3), (4, 5), (7, 8), (9, 10), (6, 7)]
>>>
Unfortunately, it looks like JDBC only supports named parameters in CallableStatement
; yet again we see the pain of Java's legacy cruft coming to bite us.
To handle this server side PostgreSQL would have to delay planning these statements until it got the parameters and then execute it as a regular ad-hoc query. There's no support for doing this at this point, though some of the groundwork has been laid with the introduction of prepared statement replanning.
It isn't clear how we'd handle it transparently on the JDBC driver side. Even if we delayed sending the prepared statement until we got the 1st set of parameters we wouldn't know that "$1" will always be equal to "$2" (and can be combined) just because they're equal for the first execution...
Hibernate can't fix this; it knows that :p1
is the same parameter in all three places but it has no way to tell PostgreSQL that through the limitations of the JDBC positional parameter interface. It could substitute all parameters into the query text but that's almost always the wrong thing to do, this is a rather unusual corner-case.
The only solid fix I see for this is for PgJDBC to extend JDBC with named or ordinal parameters, like ?:p1
or ?:1
. Then Hibernate's PostgreSQL dialect could be extended to support them. To avoid compatibility problems it'd be necessary to require a connection parameter to be set in order to enable the extended parameter syntax. This all seems pretty painful, so I'd prefer to wait until the JDBC spec adds real named parameter support (ie: don't hold your breath, your grandchildren might live to see it happen) or just work around the problem.
I suspect the best option is to use a subquery to generate a virtual table with the generated value, then group by it in the outer query. The SQL to do this would look like:
SELECT x, y_plus FROM (
SELECT x, (y+?) FROM somedemo
) temptable(x,y_plus)
GROUP BY x, y_plus;
Only one reference to the parameter is required with this phrasing. Translating it to HQL is left as an exercise for the reader ;-) .
PostgreSQL's query optimiser will generally turn this into a plan that's just as efficient as the simple string-substituted form, as seen here:
regress=> PREPARE preptest5(integer) AS SELECT x, y_plus FROM (SELECT x, (y+$1) FROM somedemo) temptable(x,y_plus) GROUP BY x, y_plus;
regress=> explain EXECUTE preptest5(1);
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=1.26..1.45 rows=15 width=8)
-> Seq Scan on somedemo (cost=0.00..1.19 rows=15 width=8)
(2 rows)
regress=> explain SELECT x, y+1 FROM somedemo GROUP BY x, y+1;
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=1.26..1.45 rows=15 width=8)
-> Seq Scan on somedemo (cost=0.00..1.19 rows=15 width=8)
(2 rows)
For a non-performance-critical ad-hoc or infrequently used function you could work around this by writing a native query that passes the parameter once in a CTE VALUES clause, like:
PREPARE preptest3(integer) AS
WITH params(a) AS (VALUES($1))
SELECT x, (y+a) FROM somedemo CROSS JOIN params GROUP BY x, y+a;
EXECUTE preptest3(1);
Needless to say this is clumsy and may not perform particularly wonderfully, but it works in cases where you have to refer to the parameter in many different contexts.
If you can't use the subquery table approach set out earlier from HQL, a better alternative to the hacky CTE is to wrap the query up in an SQL function and call the function from JDBC, eg:
-- Define this in your database schema or run it on app startup:
CREATE OR REPLACE FUNCTION test4(integer) RETURNS TABLE (x integer, y integer) AS $$
SELECT x, (y+$1) FROM somedemo GROUP BY x, y+$1;
$$ LANGUAGE sql;
-- then in JDBC prepare a simple "SELECT * FROM test4(?)", resulting in:
PREPARE preptest4(integer) AS SELECT * FROM test4($1);
EXECUTE preptest4(1);
Upvotes: 2