StAlphonzo
StAlphonzo

Reputation: 766

Vastly different query run time in application

I'm having a scaling issue with an application that uses a PostgreSQL 9 backend. I have one table who's size is about 40 million records and growing and the conditional queries against it have slowed down dramatically.

To help figure out what's going wrong, I've taken a development snapshot of the database and dump the queries with the execution time into the log.

Now for the confusing part, and the gist of the question ....

The run times for my queries in the log are vastly different (an order of magnitude+) that what I get when I run the 'exact' same query in DbVisualizer to get the explain plan.

I say 'exact' but really the difference is, the application is using a prepared statement to which I bind values at runtime while the queries I run in DbVisualizer has those values in place already. The values themselves are exactly as I pulled them from the log.

Could the use of prepared statements make that big of a difference?

Upvotes: 2

Views: 1956

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657982

The answer is YES. Prepared statements cut both ways.

On the one hand, the query does not have to be re-planned for every execution, saving some overhead. This can make a difference or be hardly noticeable, depending on the complexity of the query.

On the other hand, with uneven data distribution, a one-size-fits-all query plan may be a bad choice. Called with particular values another query plan could be (much) better suited.

Running the query with parameter values in place can lead to a different query plan. More planning overhead, possibly a (much) better query plan.

Also consider unnamed prepared statements like @peufeu provided. Those re-plan the query considering parameters every time - and you still have safe parameter handling.

Similar considerations apply to queries inside PL/pgSQL functions, where queries can be treated as prepared statements internally - unless executed dynamically with EXECUTE. I quote the manual on Executing Dynamic Commands:

The important difference is that EXECUTE will re-plan the command on each execution, generating a plan that is specific to the current parameter values; whereas PL/pgSQL may otherwise create a generic plan and cache it for re-use. In situations where the best plan depends strongly on the parameter values, it can be helpful to use EXECUTE to positively ensure that a generic plan is not selected.

Apart from that, general guidelines for performance optimization apply.

Upvotes: 3

bobflux
bobflux

Reputation: 11581

Erwin nails it, but let me add that the extended query protocol allows you to use more flavors of prepared statements. Besides avoiding re-parsing and re-planning, one big advantage of prepared statements is to send parameter values separately, which avoids escaping and parsing overhead, not to mention the opportunity for SQL injections and bugs if you don't use an API that handles parameters in a manner you can't forget to escape them.

http://www.postgresql.org/docs/9.1/static/protocol-flow.html

Query planning for named prepared-statement objects occurs when the Parse message is processed. If a query will be repeatedly executed with different parameters, it might be beneficial to send a single Parse message containing a parameterized query, followed by multiple Bind and Execute messages. This will avoid replanning the query on each execution.

The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. But if there are parameters, query planning occurs every time Bind parameters are supplied. This allows the planner to make use of the actual values of the parameters provided by each Bind message, rather than use generic estimates.

So, if your DB interface supports it, you can use unnamed prepared statements. It's a bit of a middle ground between a query and a usual prepared statement.

If you use PHP with PDO, please note that PDO's prepared statement implementation is rather useless for postgres, since it uses named prepared statements, but re-prepares every time you call prepare(), no plan caching takes place. So you get the worst of both : many roundtrips and plan without parameters. I've seen it be 1000x slower than pg_query() and pg_query_params() on specific queries where the postgres optimizer really needs to know the parameters to produce the optimal plan. pg_query uses raw queries, pg_query_params uses unnamed prepared statements. Usually one is faster than the other, that depends on the size of parameter data.

Upvotes: 3

Related Questions