Redshift + SQLAlchemy long query hangs

I'm doing something among the lines of:

conn_string = "postgresql+pg8000://%s:%s@%s:%d/%s" % (db_user, db_pass, host, port, schema)
conn = sqlalchemy.engine.create_engine(conn_string,execution_options={'autocommit':True},encoding='utf-8',isolation_level="AUTOCOMMIT") 
rows = cur.execute(sql_query)

To run queries on a Redshift cluster. Lately, I've been doing maintenance tasks such as running vacuum reindex on large tables that get truncated and reloaded every day.

The problem is that that command above takes around 7 minutes for a particular table (the table is huge, 60 million rows across 15 columns) and when I run it using the method above it just never finishes and hangs. I can see in the cluster dashboard in AWS that parts of the vacuum command are being run for about 5 minutes and then it just stops. No python errors, no errors on the cluster, no nothing.

My guess is that the connection is lost during the command. So, how do I prove my theory? Anybody else with the issue? What do I change the connection string to keep it alive longer?

Upvotes: 9

Answers (2)

Eugene Yarmash

Reputation: 149813

It's a common issue, when you are behind a NAT proxy or a firewall, to be disconnected due to network inactivity (such as waiting for a long-running query). Because of the physical limits of NAT proxies or firewalls, they can only keep a finite number of connections in their memory. The most common and logical policy is to keep newest connections and to discard old/inactive connections first.

A general solution to this problem is to use TCP keepalive packets to trick intermediate hosts to not close the connection. psycopg2 allows to pass keepalive parameters for a connection to libpq, so you could add them to the connect_args dict for create_engine(), e.g.:

conn = sqlalchemy.engine.create_engine(
    conn_string,
    connect_args={
        "keepalives": 1,           # enable TCP keepalives
        "keepalives_idle": 30,     # inactivity seconds
        "keepalives_interval": 5,  # retransmission interval
        "keepalives_count": 5      # max number of keepalives to transmit
    },
)

Upvotes: 1

Satyadev

Reputation: 643

It's most likely not a connection drop issue. To confirm this , try pushing a few million rows into a dummy table (something which takes more than 5 minutes) and see if the statement fails. Once a query has been submitted to redshift , regardless of your connection string shutting the query executes in the background.

Now, coming to the problem itself - my guess is that you are running out of memory or disk space, can you please be more elaborate and list out your redshift setup (How many nodes of dc1/ds2) ? Also, try running some admin queries and see how much space you have left on the disk. Sometimes when the cluster is loaded to the brim a disk full error is thrown but in your case since the connection might be dropped much before the error is thrown to your python shell.

Upvotes: 1

Redshift + SQLAlchemy long query hangs

Answers (2)

Related Questions