Reputation: 563
I am working on a long-running python process that performs a lot of database access (mostly reads, occasional writes). Sometimes it may be necessary to terminate the process before it finishes (e.g. by using the kill
command) and when this happens I would like to log a value to the database indicating that the particular run was canceled. (I am also logging the occurrence to a log file; I would like to have the information in both places.)
I have found that if I interrupt the process while the database connection is active, the connection becomes unusable; specifically, it hangs the process if I try to use it in any way.
The actual application is rather large and complex, but this snippet reproduces the problem reliably.
The table test
in the database has two columns, id
(serial) and message
(text). I prepopulated it with one row so the UPDATE
statement below would have something to change.
import psycopg2
import sys
import signal
pg_host = 'localhost'
pg_user = 'redacted'
pg_password = 'redacted'
pg_database = 'test_db'
def write_message(msg):
print "Writing: " + msg
cur.execute("UPDATE test SET message = %s WHERE id = 1", (msg,))
conn.commit()
def signal_handler(signal, frame):
write_message('Interrupt!')
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
if __name__ == '__main__':
conn = psycopg2.connect(host=pg_host, user=pg_user, password=pg_password, database=pg_database)
cur = conn.cursor()
write_message("Starting")
for i in xrange(10000):
# I press ^C somewhere in here
cur.execute("SELECT * FROM test")
cur.fetchall()
write_message("Finishing")
When I run this script without interruption, it completes as expected. That is, the row in the database is updated to say "Starting" then "Finishing".
If I press ctrl-C
during the loop indicated by the comment, python hangs indefinitely. It no longer responds to keyboard input, and the process has to be killed from elsewhere. Looking in my postgresql log, the UPDATE
statement with "Interrupted!" is never received by the database server.
If I add a debugging breakpoint at the beginning of signal_handler() I can see that doing almost anything with the database connection at that point causes the same hang. Trying to execute
a SELECT
, issuing a conn.rollback()
, conn.commit()
, conn.close()
or conn.reset()
all cause the hang. Executing conn.cancel()
does not cause a hang, but it doesn't improve the situation; subsequent use of the connection still causes a hang. If I remove the database access from write_message()
then the script is able to exit gracefully when interrupted, so the hang is definitely database connection related.
Also worth noting: if I change the script so that I am interrupting something other than database activity, it works as desired, logging "Interrupted!" to the database. E.g., if I replace the for i in xrange(10000)
loop with a simple sleep(10)
and interrupt that, it works fine. So the problem seems to be specifically related to interrupting psycopg2 with a signal while it is performing database access, then trying to use the connection.
Is there any way to salvage the existing psycopg2 connection and use it to update the database after this kind of interruption?
If not, is there at least a way to terminate it cleanly so if some subsequent code tried to use it, it wouldn't cause a hang?
Finally, is this somehow expected behavior, or is it a bug that should be reported? It makes sense to me that the connection could be in a bad state after this kind of interruption, but ideally it would throw an exception indicating the problem rather than hanging.
In the meantime, I have discovered that if I create an entirely new connection with psycopg2.connect()
after the interrupt and am careful not to access the old one, I can still update the database from the interrupted process. This is probably what I'll do for now, but it feels untidy.
Upvotes: 3
Views: 1112
Reputation: 563
I filed an issue for this on the psycopg2 github and received a helpful response from the developer. In summary:
psycopg2.extensions.set_wait_callback(psycopg2.extras.wait_select)
improves the situation a bit (at least in my environment) by causing execute()
statements called from within the signal handler to throw an exception rather than hang. However, doing other things with the conneciton (e.g. reset()
) still caused a hang for me, so ultimately it's still best to just create a new connection within the signal handler rather than trying to salvage the existing one.Upvotes: 1