pcace
pcace

Reputation: 670

postgresql / postgis db crashes on query (which used to work)

i am having a problem with a postgresql db. it crashes completely on a query which used to work fine some time ago:

2025-02-12 16:07:39.937 UTC [1] LOG:  server process (PID 104) was terminated by signal 11: Segmentation fault
2025-02-12 16:07:39.937 UTC [1] DETAIL:  Failed process was running: WITH geojson AS (
            SELECT
              ST_SetSRID(
                ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[11.989549827991663,51.48198164709467],[11.978044706259787,51.482274616326],[11.977576493157148,51.475090930653295],[11.989079812591411,51.4747980363145],[11.989549827991663,51.48198164709467]]]}'), 4326
              ) AS geom
          )
          SELECT jsonb_build_object(
                    'CityJsonFeature', cj_geometry.cityjsonfeature,
                    'CityJson', jsonb_build_object(
                      'type', 'CityJSON',
                      'version', '2.0',
                      'CityObjects', '{}'::jsonb,
                      'vertices', '[]'::jsonb,
                      'transform', jsonb_build_object(
                        'scale', metadata.scale,
                        'translate', metadata.translate
                        ),
                      'metadata', jsonb_build_object(
                        'referenceSystem', 'EPSG:3857'
                      )
                    )
                  )
    
          FROM cj_geometry
          JOIN m
2025-02-12 16:07:39.937 UTC [1] LOG:  terminating any other active server processes
2025-02-12 16:07:39.939 UTC [1] LOG:  all server processes terminated; reinitializing
2025-02-12 16:07:39.964 UTC [106] LOG:  database system was interrupted; last known up at 2025-02-12 16:02:24 UTC
2025-02-12 16:07:41.601 UTC [106] DEBUG:  checkpoint record is at 29/1A00A0E0
2025-02-12 16:07:41.601 UTC [106] DEBUG:  redo record is at 29/1A00A0E0; shutdown true
2025-02-12 16:07:41.601 UTC [106] DEBUG:  next transaction ID: 908; next OID: 23292321
2025-02-12 16:07:41.601 UTC [106] DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2025-02-12 16:07:41.601 UTC [106] DEBUG:  oldest unfrozen transaction ID: 731, in database 1
2025-02-12 16:07:41.601 UTC [106] DEBUG:  oldest MultiXactId: 1, in database 1
2025-02-12 16:07:41.601 UTC [106] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2025-02-12 16:07:41.601 UTC [106] LOG:  database system was not properly shut down; automatic recovery in progress
2025-02-12 16:07:41.601 UTC [106] DEBUG:  transaction ID wrap limit is 2147484378, limited by database with OID 1
2025-02-12 16:07:41.601 UTC [106] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2025-02-12 16:07:41.601 UTC [106] DEBUG:  starting up replication slots
2025-02-12 16:07:41.601 UTC [106] DEBUG:  xmin required by slots: data 0, catalog 0
2025-02-12 16:07:41.622 UTC [106] DEBUG:  resetting unlogged relations: cleanup 1 init 0
2025-02-12 16:07:41.625 UTC [106] LOG:  redo starts at 29/1A00A158
2025-02-12 16:07:41.625 UTC [106] LOG:  invalid record length at 29/1A00A190: expected at least 24, got 0
2025-02-12 16:07:41.625 UTC [106] LOG:  redo done at 29/1A00A158 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-02-12 16:07:41.625 UTC [106] DEBUG:  resetting unlogged relations: cleanup 0 init 1
2025-02-12 16:07:41.638 UTC [106] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2025-02-12 16:07:41.638 UTC [106] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2025-02-12 16:07:41.647 UTC [107] LOG:  checkpoint starting: end-of-recovery immediate wait
2025-02-12 16:07:41.647 UTC [107] DEBUG:  performing replication slot checkpoint
2025-02-12 16:07:41.693 UTC [107] DEBUG:  checkpoint sync: number=1 file=pg_xact/0000 time=2.841 ms
2025-02-12 16:07:41.696 UTC [107] DEBUG:  checkpoint sync: number=2 file=pg_multixact/offsets/0000 time=3.081 ms
2025-02-12 16:07:41.714 UTC [107] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.018 s, sync=0.007 s, total=0.076 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB; lsn=29/1A00A190, redo lsn=29/1A00A190
2025-02-12 16:07:41.727 UTC [1] DEBUG:  starting background worker process "logical replication launcher"
2025-02-12 16:07:41.727 UTC [110] DEBUG:  autovacuum launcher started
2025-02-12 16:07:41.728 UTC [1] LOG:  database system is ready to accept connections
2025-02-12 16:07:41.729 UTC [111] DEBUG:  logical replication launcher started

the full query looks like this:

      WITH geojson AS (
        SELECT
          ST_SetSRID(
            ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[11.989549827991663,51.48198164709467],[11.978044706259787,51.482274616326],[11.977576493157148,51.475090930653295],[11.989079812591411,51.4747980363145],[11.989549827991663,51.48198164709467]]]}'), 4326
          ) AS geom
      )
      SELECT jsonb_build_object(
                'CityJsonFeature', cj_geometry.cityjsonfeature,
                'CityJson', jsonb_build_object(
                  'type', 'CityJSON',
                  'version', '2.0',
                  'CityObjects', '{}'::jsonb,
                  'vertices', '[]'::jsonb,
                  'transform', jsonb_build_object(
                    'scale', metadata.scale,
                    'translate', metadata.translate
                    ),
                  'metadata', jsonb_build_object(
                    'referenceSystem', 'EPSG:3857'
                  )
                )
              )

      FROM cj_geometry
      JOIN metadata ON cj_geometry.metadataid = metadata.metadataid
      JOIN geojson ON
        cj_geometry.groundgeometry && ST_Transform(geojson.geom, 3857)
        AND ST_Contains(geojson.geom, ST_Transform(cj_geometry.groundgeometry, 4326));


the crash happenes the same way on three different systems with the same database (from a backup). what the hell is going on here?

Any idea how i would get closer to solving the problem? Is there any kind of error boundary to set in postgresql? or any kind of a better debug output?

thanks a lot

EDIT:

Server machines: Debian12, 64GB RAM, 12Core Epic (Dedicated Server hostet by Hetzner), Postgresql 17, Postgis 3.5 in Docker Container (postgis/postgis:17-3.5)

yesterday i had to reinstall postgresql same version same docker container (because of a messed up other docker container which used up 100% hdd so nothing worked anymore), using the same persistant directory. it feels like it is a memory issue, but it has 99% free memory. the database table cj_geometry is about 70.000.000 entries long, and has these indices:

  CREATE INDEX IF NOT EXISTS metadata_id ON metadata USING hash(metadataid);
  CREATE INDEX IF NOT EXISTS metadata_filename ON metadata USING hash(filename);
  -- CREATE INDEX IF NOT EXISTS idx_metadata_filename_source ON metadata USING hash(filename, source);
  -- city_object indexes
  CREATE INDEX IF NOT EXISTS cj_geometry_id ON cj_geometry USING hash(metadataid);
  CREATE INDEX IF NOT EXISTS city_object_location_gix ON cj_geometry USING gist(location);
  CREATE INDEX IF NOT EXISTS city_object_bbox_gix ON cj_geometry USING gist(bbox);
  CREATE INDEX IF NOT EXISTS city_object_groundgeometry_gix ON cj_geometry USING gist(groundgeometry);
  CLUSTER cj_geometry USING city_object_location_gix;
  CLUSTER cj_geometry USING city_object_bbox_gix;
  CLUSTER cj_geometry USING city_object_groundgeometry_gix;

, the response should be one json object maybe around 100-500kbyte of size. the settings for the database are essentially:

# OS Type: linux
# DB Type: dw
# Total Memory (RAM): 64 GB
# CPUs num: 16
# Data Storage: ssd


max_connections = 40
shared_buffers = 16GB
effective_cache_size = 48GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 26214kB
huge_pages = try
min_wal_size = 4GB
max_wal_size = 16GB
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16
max_parallel_maintenance_workers = 4

# General Settings

listen_addresses = '*'

log_timezone = 'Etc/UTC'

datestyle = 'iso, mdy'
timezone = 'Etc/UTC'
lc_messages = 'en_US.utf8'      # locale for system error message
lc_monetary = 'en_US.utf8'      # locale for monetary formatting
lc_numeric = 'en_US.utf8'       # locale for number formatting
lc_time = 'en_US.utf8'          # locale for time formatting
default_text_search_config = 'pg_catalog.english'


Upvotes: 2

Views: 83

Answers (0)

Related Questions