Reputation: 381
My Postgres version is 9.6.12. When I run the query below, Postgres crashes with the error below. The query works fine when I replace LEFT JOIN with just a JOIN
proddb=# SELECT 1 - count(event_date) AS result FROM (SELECT now()::date AS run_date) p
JOIN historic.audit_event ON event_code = 5199
AND event_param1 = 'fullscriptpendingorders' AND event_date > run_date
AND event_date < (run_date + '1 day'::interval);
result
--------
0
(1 row)
-- When I change the JOIN to LEFT JOIN
proddb=# SELECT 1 - count(event_date) AS result FROM (SELECT now()::date AS run_date) p
**LEFT JOIN** historic.audit_event ON event_code = 5199
AND event_param1 = 'fullscriptpendingorders'
AND event_date > run_date
AND event_date < (run_date + '1 day'::interval);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The postgres log shows the error below showing that Postgres crashed. Postgres automaticaly recovers after a few minutes, but this is impacting critical Production. Can you please help? Is this a Postgres bug?
2019-04-08 21:32:56 PDT [26911]: [19631-1] [] [] LOG: 00000: server process (PID 23981) was terminated by signal 11: Segmentation fault
2019-04-08 21:32:56 PDT [26911]: [19632-1] [] [] DETAIL: Failed process was running: SELECT 1 - count(event_date) AS result FROM (SELECT now()::date AS run_date) p LEFT JOIN historic.audit_event ON event_code = 5199 AND event_param1 = 'amsa.fullscriptpendingorders' AND event_date > run_date AND event_date < (run_date + '1 day'::interval);
2019-04-08 21:32:56 PDT [26911]: [19633-1] [] [] LOCATION: LogChildExit, postmaster.c:3574
2019-04-08 21:32:56 PDT [26911]: [19634-1] [] [] LOG: 00000: terminating any other active server processes
2019-04-08 21:32:56 PDT [26911]: [19635-1] [] [] LOCATION: HandleChildCrash, postmaster.c:3294
2019-04-08 21:32:56 PDT [24633]: [1-1] [[unknown]] [[unknown]] LOG: 00000: connection received: host=[local]
2019-04-08 21:32:56 PDT [24633]: [2-1] [[unknown]] [[unknown]] LOCATION: BackendInitialize, postmaster.c:4192
2019-04-08 21:32:56 PDT [24633]: [3-1] [postgres] [emr_prod] FATAL: 57P03: the database system is in recovery mode
2019-04-08 21:32:56 PDT [24633]: [4-1] [postgres] [emr_prod] LOCATION: ProcessStartupPacket, postmaster.c:2230
2019-04-08 21:32:56 PDT [26911]: [19636-1] [] [] LOG: 00000: all server processes terminated; reinitializing
2019-04-08 21:32:56 PDT [26911]: [19637-1] [] [] LOCATION: PostmasterStateMachine, postmaster.c:3818
I don't see this behavior when I have the exact dataset on a QA database which is version 9.6.9
postgres@emr_qa07=# select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)
postgres@emr_qa07=# SELECT 1 - count(event_date) AS result FROM (SELECT now()::date AS run_date) p JOIN historic.audit_event ON event_code = 5199 AND event_param1 = 'fullscriptpendingorders' AND event_date > run_date AND event_date < (run_date + '1 day'::interval);
result
--------
1
(1 row)
postgres@emr_qa07=# SELECT 1 - count(event_date) AS result FROM (SELECT now()::date AS run_date) p LEFT JOIN historic.audit_event ON event_code = 5199 AND event_param1 = 'fullscriptpendingorders' AND event_date > run_date AND event_date < (run_date + '1 day'::interval);
result
--------
1
(1 row)
Upvotes: 2
Views: 6069
Reputation: 4314
Yes, this is a PostgreSQL bug - it should never segfault no matter what query you give it. If you can reduce it to a minimal test case, the developers would probably be interested in a bug report.
The other possibility here is that the underlying data files for your database have become corrupt. It's still a bug that PostgreSQL crashes, but going to be much harder to determine what happened. Potentially dumping and restoring the database may fix it.
Upvotes: 5