Reputation: 6530

how do I gracefully kill stale server process postgres

Occasionally in our lab, our postgres 8.3 database will get orphaned from the pid file, and we get this message when trying to shut down the database:

Error: pid file is invalid, please manually kill the stale server process postgres

When this happens, we immediately do a pg_dump so we can restore the database later. But, if we just kill -9 the orphan postgres process and then start it, the database starts only with the data from the last successful shutdown. But if you psql to it before killing it, the data is all available, thus why the pg_dump works.

Is there a way to gracefully shutdown the orphaned postgres process so we don't have to go through the pg_dump and restore? Or is there a way to have the database recover after killing the orphaned process?

Upvotes: 3

Answers (3)

Dan Benamy

Reputation: 837

I use a script like the following run by cron every minute.

#!/bin/bash

DB="YOUR_DB"

# Here's a snippet to watch how long each connection to the db has been open:
#     watch -n 1 'ps -o pid,cmd,etime -C postgres | grep $DB'

# This program kills any postgres workers/connections to the specified database
# which have been running for 2 or 3 minutes. It actually kills workers which
# have an elapsed time including "02:" or "03:". That'll be anything running
# for at least 2 minutes and less than 4. It'll also cover anything that
# managed to stay around until an hour and 2 or 3 minutes, etc.
#
# Run this once a minute via cron and it should catch any connection open
# between 2 and 3 minutes. You can temporarily disable it if if you need to run
# a long connection once in a while.
#
# The check for "03:" is in case there's a little lag starting the cron job and
# the timing is really bad and it never sees a worker in the 1 minute window
# when it's got "02:".
old=$(ps -o pid,cmd,etime -C postgres | grep "$DB" | egrep '0[23]:')
if [ -n "$old" ]; then
    echo "Killing:"
    echo "$old"
    echo "$old" | awk '{print $1}' | xargs -I {} kill {}
fi

Upvotes: -1

Magnus Hagander

Reputation: 25118

Never use kill -9.

And I would strongly advice you to try to figure out exactly how this happens. Where exactly does the error message come from? It's not a PostgreSQL error message. Are you by any chance mixing different ways to start/stop the server (initscripts sometimes and pg_ctl sometimes, for example)? That could probably cause things to go out of sync.

But to answer the direct question - use a regular kill (no -9) on the process to shut it down. Make sure you kill all the postgres processes if there is more than one running.

The database will always do an automatic recovery whenever it's shut down. This shuold happen with kill -9 as well - any data that is committed should be up there. This almost sounds like you have two different data directories mounted on top of each other or something like that - this has been a known issue with NFS at least before.

Upvotes: 3

Milen A. Radev

Reputation: 62613

According to the documentation you could either send SIGTERM or SIGQUIT. SIGTERM is preferred. Either way never use SIGKILL (as you know from personal experience).

Edit: on the other hand what you experience is not normal and could indicate a mis-configuration or a bug. Please, ask for assistance on the pgsql-admin mailing list.

Upvotes: 4

how do I gracefully kill stale server process postgres

Answers (3)

Related Questions