Reputation: 670
i am having a problem with a postgresql db. it crashes completely on a query which used to work fine some time ago:
2025-02-12 16:07:39.937 UTC [1] LOG: server process (PID 104) was terminated by signal 11: Segmentation fault
2025-02-12 16:07:39.937 UTC [1] DETAIL: Failed process was running: WITH geojson AS (
SELECT
ST_SetSRID(
ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[11.989549827991663,51.48198164709467],[11.978044706259787,51.482274616326],[11.977576493157148,51.475090930653295],[11.989079812591411,51.4747980363145],[11.989549827991663,51.48198164709467]]]}'), 4326
) AS geom
)
SELECT jsonb_build_object(
'CityJsonFeature', cj_geometry.cityjsonfeature,
'CityJson', jsonb_build_object(
'type', 'CityJSON',
'version', '2.0',
'CityObjects', '{}'::jsonb,
'vertices', '[]'::jsonb,
'transform', jsonb_build_object(
'scale', metadata.scale,
'translate', metadata.translate
),
'metadata', jsonb_build_object(
'referenceSystem', 'EPSG:3857'
)
)
)
FROM cj_geometry
JOIN m
2025-02-12 16:07:39.937 UTC [1] LOG: terminating any other active server processes
2025-02-12 16:07:39.939 UTC [1] LOG: all server processes terminated; reinitializing
2025-02-12 16:07:39.964 UTC [106] LOG: database system was interrupted; last known up at 2025-02-12 16:02:24 UTC
2025-02-12 16:07:41.601 UTC [106] DEBUG: checkpoint record is at 29/1A00A0E0
2025-02-12 16:07:41.601 UTC [106] DEBUG: redo record is at 29/1A00A0E0; shutdown true
2025-02-12 16:07:41.601 UTC [106] DEBUG: next transaction ID: 908; next OID: 23292321
2025-02-12 16:07:41.601 UTC [106] DEBUG: next MultiXactId: 1; next MultiXactOffset: 0
2025-02-12 16:07:41.601 UTC [106] DEBUG: oldest unfrozen transaction ID: 731, in database 1
2025-02-12 16:07:41.601 UTC [106] DEBUG: oldest MultiXactId: 1, in database 1
2025-02-12 16:07:41.601 UTC [106] DEBUG: commit timestamp Xid oldest/newest: 0/0
2025-02-12 16:07:41.601 UTC [106] LOG: database system was not properly shut down; automatic recovery in progress
2025-02-12 16:07:41.601 UTC [106] DEBUG: transaction ID wrap limit is 2147484378, limited by database with OID 1
2025-02-12 16:07:41.601 UTC [106] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2025-02-12 16:07:41.601 UTC [106] DEBUG: starting up replication slots
2025-02-12 16:07:41.601 UTC [106] DEBUG: xmin required by slots: data 0, catalog 0
2025-02-12 16:07:41.622 UTC [106] DEBUG: resetting unlogged relations: cleanup 1 init 0
2025-02-12 16:07:41.625 UTC [106] LOG: redo starts at 29/1A00A158
2025-02-12 16:07:41.625 UTC [106] LOG: invalid record length at 29/1A00A190: expected at least 24, got 0
2025-02-12 16:07:41.625 UTC [106] LOG: redo done at 29/1A00A158 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-02-12 16:07:41.625 UTC [106] DEBUG: resetting unlogged relations: cleanup 0 init 1
2025-02-12 16:07:41.638 UTC [106] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2025-02-12 16:07:41.638 UTC [106] DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1
2025-02-12 16:07:41.647 UTC [107] LOG: checkpoint starting: end-of-recovery immediate wait
2025-02-12 16:07:41.647 UTC [107] DEBUG: performing replication slot checkpoint
2025-02-12 16:07:41.693 UTC [107] DEBUG: checkpoint sync: number=1 file=pg_xact/0000 time=2.841 ms
2025-02-12 16:07:41.696 UTC [107] DEBUG: checkpoint sync: number=2 file=pg_multixact/offsets/0000 time=3.081 ms
2025-02-12 16:07:41.714 UTC [107] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.018 s, sync=0.007 s, total=0.076 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB; lsn=29/1A00A190, redo lsn=29/1A00A190
2025-02-12 16:07:41.727 UTC [1] DEBUG: starting background worker process "logical replication launcher"
2025-02-12 16:07:41.727 UTC [110] DEBUG: autovacuum launcher started
2025-02-12 16:07:41.728 UTC [1] LOG: database system is ready to accept connections
2025-02-12 16:07:41.729 UTC [111] DEBUG: logical replication launcher started
the full query looks like this:
WITH geojson AS (
SELECT
ST_SetSRID(
ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[11.989549827991663,51.48198164709467],[11.978044706259787,51.482274616326],[11.977576493157148,51.475090930653295],[11.989079812591411,51.4747980363145],[11.989549827991663,51.48198164709467]]]}'), 4326
) AS geom
)
SELECT jsonb_build_object(
'CityJsonFeature', cj_geometry.cityjsonfeature,
'CityJson', jsonb_build_object(
'type', 'CityJSON',
'version', '2.0',
'CityObjects', '{}'::jsonb,
'vertices', '[]'::jsonb,
'transform', jsonb_build_object(
'scale', metadata.scale,
'translate', metadata.translate
),
'metadata', jsonb_build_object(
'referenceSystem', 'EPSG:3857'
)
)
)
FROM cj_geometry
JOIN metadata ON cj_geometry.metadataid = metadata.metadataid
JOIN geojson ON
cj_geometry.groundgeometry && ST_Transform(geojson.geom, 3857)
AND ST_Contains(geojson.geom, ST_Transform(cj_geometry.groundgeometry, 4326));
the crash happenes the same way on three different systems with the same database (from a backup). what the hell is going on here?
Any idea how i would get closer to solving the problem? Is there any kind of error boundary to set in postgresql? or any kind of a better debug output?
thanks a lot
EDIT:
Server machines: Debian12, 64GB RAM, 12Core Epic (Dedicated Server hostet by Hetzner), Postgresql 17, Postgis 3.5 in Docker Container (postgis/postgis:17-3.5)
yesterday i had to reinstall postgresql same version same docker container (because of a messed up other docker container which used up 100% hdd so nothing worked anymore), using the same persistant directory. it feels like it is a memory issue, but it has 99% free memory. the database table cj_geometry is about 70.000.000 entries long, and has these indices:
CREATE INDEX IF NOT EXISTS metadata_id ON metadata USING hash(metadataid);
CREATE INDEX IF NOT EXISTS metadata_filename ON metadata USING hash(filename);
-- CREATE INDEX IF NOT EXISTS idx_metadata_filename_source ON metadata USING hash(filename, source);
-- city_object indexes
CREATE INDEX IF NOT EXISTS cj_geometry_id ON cj_geometry USING hash(metadataid);
CREATE INDEX IF NOT EXISTS city_object_location_gix ON cj_geometry USING gist(location);
CREATE INDEX IF NOT EXISTS city_object_bbox_gix ON cj_geometry USING gist(bbox);
CREATE INDEX IF NOT EXISTS city_object_groundgeometry_gix ON cj_geometry USING gist(groundgeometry);
CLUSTER cj_geometry USING city_object_location_gix;
CLUSTER cj_geometry USING city_object_bbox_gix;
CLUSTER cj_geometry USING city_object_groundgeometry_gix;
, the response should be one json object maybe around 100-500kbyte of size. the settings for the database are essentially:
# OS Type: linux
# DB Type: dw
# Total Memory (RAM): 64 GB
# CPUs num: 16
# Data Storage: ssd
max_connections = 40
shared_buffers = 16GB
effective_cache_size = 48GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 26214kB
huge_pages = try
min_wal_size = 4GB
max_wal_size = 16GB
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16
max_parallel_maintenance_workers = 4
# General Settings
listen_addresses = '*'
log_timezone = 'Etc/UTC'
datestyle = 'iso, mdy'
timezone = 'Etc/UTC'
lc_messages = 'en_US.utf8' # locale for system error message
lc_monetary = 'en_US.utf8' # locale for monetary formatting
lc_numeric = 'en_US.utf8' # locale for number formatting
lc_time = 'en_US.utf8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
Upvotes: 2
Views: 83