Postgres Segfault when trying to pg_upgrade

Getting a segfault upgrading Postgres, is it memory related and how to workaround?

Ubuntu Jammy Postgres 10.x and 14.10 both installed. Trying to upgrade a geospatial database from pg10 with postgis 3.2.3 to 14.10 with postgis 3.4.0 Server has 32G ram.

Understand that postgres and postgis cause major headache for upgrade paths. They even tend to break simple dump and restores. But this is really odd...

Using pg_upgrade command with --check and get a confirmation of clusters compatible.

Then I run the upgrade:

    /usr/lib/postgresql/14/bin/pg_upgrade   
--old-datadir=/var/lib/postgresql/10/main  --new-datadir=/var/lib/postgresql/14/production  --old-bindir=/usr/lib/postgresql/10/bin  --new-bindir=/usr/lib/postgresql/14/bin  --old-options '-c config_file=/etc/postgresql/10/main/postgresql.conf'   --new-options '-c config_file=/etc/postgresql/14/production/postgresql.conf' --old-port=5432  --new-port=5433 --jobs=4

But when we run the upgrade we get

-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for removed "abstime" data type in user tables     ok
Checking for removed "reltime" data type in user tables     ok
Checking for removed "tinterval" data type in user tables   ok
Checking for user-defined encoding conversions              ok
Checking for user-defined postfix operators                 ok
Checking for incompatible polymorphic functions             ok
Checking for tables WITH OIDS                               ok
Checking for invalid "sql_identifier" user columns          ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting oldest XID for new cluster                          ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
  enhanced_lod                                              
failure

Failure happens sometimes at 'Analyzing all rows' or farther along hinting that something outside postgres is failing. While this is upgrading, it writes to pg_upgrade_internal.log and not to the general postgres log. The pg_upgrade_internal.log shows:

command: "/usr/lib/postgresql/14/bin/vacuumdb" --host /tmp --port 5433 --username postgres --all --analyze  >> "pg_upgrade_utility.log" 2>&1
vacuumdb: vacuuming database "postgres"
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
vacuumdb: error: connection to server on socket "/tmp/.s.PGSQL.5433" failed: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

So it hints at a connection disconnect issue, but it is being upgraded locally. Which leads me to looking at dmesg and finding segfaults...

dmesg shows segfault:

[  598.681830] Code: 99 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 12 4a 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 98 49 17 00 64 48 83 3a
[ 1098.583919] postgres[2564]: segfault at 5642ad8ca319 ip 00007fadbbd593fe sp 00007fff60caf8a0 error 4 in libc.so.6[7fadbbcdc000+195000]
[ 1098.583931] Code: 99 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 12 4a 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 98 49 17 00 64 48 83 3a
[ 1573.936279] postgres[3039]: segfault at 55b7c3eb12ef ip 00007f5fab8373fe sp 00007fff3254c2c0 error 4 in libc.so.6[7f5fab7ba000+195000]
[ 1573.936291] Code: 99 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 12 4a 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 98 49 17 00 64 48 83 3a
[ 1949.345253] postgres[3778]: segfault at 5617068fc411 ip 00007f0e14b0e3fe sp 00007ffca5a05b90 error 4 in libc.so.6[7f0e14a91000+195000]

So what do I do? I'm a sql guy not a software guy. All I know is segfaults are really bad.

Upvotes: 0

Views: 257

Answers (0)

Related Questions