RandomB
RandomB

Reputation: 3737

How to match queries in pg_stat_statements with queries in the log from auto_explain extension?

I use 2 sources of information about my queries performance: pg_stat_statements table and the log file with output from auto_explain extension. The pg_stat_statements table has queryid column (some hash) which looks like:

postgres=# select queryid from pg_stat_statements;
       queryid        
----------------------
  1809546128015582813
  3296564334073921976
 -2249120274633451873
 -4210932522978766496
 -7924989344813642498
 -5969017373161620754
 -3085278641555168774
 -4062271456003967982
  5649680170374315937
...

And the log contains outputs from auto_explain extension which looks like:

...
2020-02-21 13:57:17.132 UTC [postgres postgres 127.0.0.1(59706)] LOG:  duration: 0.744 ms  plan:
    Query Text: DECLARE temp1 NO SCROLL CURSOR FOR select * from city
    Seq Scan on public.city  (cost=0.00..72.79 rows=4079 width=31) (actual time=0.016..0.475 rows=4079 loops=1)
      Output: id, name, countrycode, district, population
2020-02-21 13:57:17.355 UTC [postgres postgres 127.0.0.1(59706)] LOG:  duration: 2.447 ms  plan:
    Query Text: select * from city
    Seq Scan on public.city  (cost=0.00..72.79 rows=4079 width=31) (actual time=0.008..0.380 rows=4079 loops=1)
      Output: id, name, countrycode, district, population
...

Now I want to match queries from the pg_stat_statements with ones in the log. Obvious the criteria could be a queryid - unique identifier (a hash) of the query but there are 3 problems in this approach:

  1. Queries in the pg_stat_statements may look like select * from city where name = $1 and in the log like select * from city where name = 'Béjaïa'
  2. No any queryid in the log sure.
  3. No any guarantees on spaces, case and similar forms...

Is it possible to match them in some way? Maybe Postgres has some function like pg_get_queryid_from_query_sql(the_sql) ? Is it possible at all?

Upvotes: 0

Views: 957

Answers (2)

pifor
pifor

Reputation: 7882

I don't think that matching pg_stat_statements queryid with auto_explain output is currentlty possible because - as far as I understand - queryid is computed only in pg_stat_statements extension. However another extension pg_sampletolog allows to have queryid in PG log in some cases (but not as auto_explain output).

UPDATE 01-APR-2020 I have developped an PostgreSQL extension for this named pg_logqueryid

Upvotes: 1

Lukas Fittl
Lukas Fittl

Reputation: 361

It isn't possible to do that with built-in Postgres features, but you can do it with some scripting and post-processing of the data.

One way to achieve this is using the pg_query library, that effectively extracts the Postgres parser to be able to run outside the server. You can then use the fingerprint mechanism to match up the queries. Fingerprints are based on the query's AST, and ignore whitespace/formatting/parameterization.

Example:

require "pg_query"

# This would be the "query" field from pg_stat_statements
PgQuery.parse("select * from city where x = $1").fingerprint
=> "0271d348571e386f82a922b54a01e6ef9205a8f2fe"

# This would be the "Query Text" field from auto_explain
PgQuery.parse("SELECT * FROM city WHERE x = 123").fingerprint
=> "0271d348571e386f82a922b54a01e6ef9205a8f2fe"

Disclaimer: I'm the main author of pg_query, and we've recently built a similar system in pganalyze.

Upvotes: 1

Related Questions