Reputation: 2099
I'm trying to run VACUUM REINDEX
for some huge tables in Redshift. When I run one of those vacuums in SQLWorkbenchJ, it never finishes and returns a connection reset by peer after about 2 hours. Same thing actually happens in Python when I run the vacuums using something like this:
conn_string = "postgresql+pg8000://%s:%s@%s:%d/%s" % (db_user, db_pass, host, port, schema)
conn = sqlalchemy.engine.create_engine(conn_string,
execution_options={'autocommit': True},
encoding='utf-8',
connect_args={"keepalives": 1, "keepalives_idle": 60,
"keepalives_interval": 60},
isolation_level="AUTOCOMMIT")
conn.execute(query)
Is there a way that either using Python or SQLWorkbenchJ I can run these queries? I expect them to last at least an hour each. Is this expected behavior?
Upvotes: 1
Views: 1899
Reputation: 156
You might need to add a mechanism in your python script to retry when the reindexing fails, based on https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html
If a VACUUM REINDEX operation terminates before it completes, the next VACUUM resumes the reindex operation before performing the full vacuum operation.
Couple of things to note (I apologize if you already know this)
If your table is using interleaved, you will need to first check whether you even need to re-index?. Sample query
SELECT tbl AS table_id,
(col + 1) AS column_num, -- Column in this view is zero indexed
interleaved_skew,
last_reindex
FROM svv_interleaved_columns
Upvotes: 3