Jan Spurny
Jan Spurny

Reputation: 5537

Psycopg2 db connection hangs on lost network connection

Problem description

I'm using psycopg2 to connect to my PostgreSQL database on a remote host. I open a connection and wait for requests, then for each request I run queries on the connection and return data.

But when the network connection is lost after the connection is already open the next db query hangs and I have to kill the program manually.

Details:

What I want/need

I need some reliable way to detect a failed connection before running a query, so my program won't hang, or a way to make cursor.execute(..) raise an exception on failed connection.

Example:

import psycopg2
import time

conn = psycopg2.connect("host='dbs' dbname='foo' user='joe' password='x'")
time.sleep(10) # I manually turn VPN off during this sleep..
cu = conn.cursor()
cu.execute('SELECT 1') # <- hangs here
print cu.fetchone()
cu.commit()

What have I tried (and what didn't work):

Upvotes: 20

Views: 15004

Answers (4)

Nick
Nick

Reputation: 163

OP's and Gabriel Salla's solutions which configure KEEPALIVE are not complete. This solution only works when the connection is idle (no data sent before network became down) and the network became down.

If some data have already sent over the network that is already down but not yet detected so by the KEEPALIVE feature there will be a hang. This happens because the RTO mechanism is used instead of KEEPALIVE when some data is send.

To set timeout for an RTO you must set TCP_USER_TIMEOUT timeout (in milliseconds) for socket.

The complete solution is (both KEEPALIVE and RTO timeouts configured to 10 seconds):

s = socket.fromfd(conn.fileno(), socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 6)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 2)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_USER_TIMEOUT, 10000)

Upvotes: 2

Justin
Justin

Reputation: 3624

After a long and brutal struggle, I think I fixed this issue by simply doing the strategy others are talking about, but using the psycopg2 connect function itself:


from psycopg2 import connect


conn = connect(
        database=database,
        user=username,
        password=password,
        host=hostname,
        port=port,
        connect_timeout=3,
        # https://www.postgresql.org/docs/9.3/libpq-connect.html
        keepalives=1,
        keepalives_idle=5,
        keepalives_interval=2,
        keepalives_count=2)

I was seeing psycopg2 hang consistently on long-running queries, but now the issue seems to be fully resolved.

Note this may be new functionality, since this question is old.

Upvotes: 10

Gabriel Salla
Gabriel Salla

Reputation: 126

Took a look at the socket timeout and after reading this and this, these settings worked for me

s = socket.fromfd(connection.fileno(),
                  socket.AF_INET, socket.SOCK_STREAM)
# Enable sending of keep-alive messages
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
# Time the connection needs to remain idle before start sending
# keepalive probes
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, int(ceil(time)))
# Time between individual keepalive probes
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 1)
# The maximum number of keepalive probes should send before dropping
# the connection
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3)

Upvotes: 0

Diptesh Chatterjee
Diptesh Chatterjee

Reputation: 385

In order to make sure a connection is still valid, read the property connection.isolation_level. This will raise an OperationalError with pgcode == "57P01" in case the connection is dead.

try: connection.isolation_level except OperationalError as oe: conn = psycopg2.connect(dsn)

Upvotes: 0

Related Questions