Python script hangs when executing long running query, even after query completes

Question

I've got a Python script that loops through folders and within each folder, executes the sql file against our Redshift cluster (using psycopg2). Here is the code that does the loop (note: this works just fine for queries that take only a few minutes to execute):

for folder in dir_list: 
    #Each query is stored in a folder by group, so we have to go through each folder and then each file in that folder
    file_list = os.listdir(source_dir_wkly + "\" + str(folder))

    for f in file_list:
        src_filename = source_dir_wkly + "\" + str(folder) + "\" + str(f)
        dest_filename = dest_dir_wkly + "\" + os.path.splitext(os.path.basename(src_filename))[0] + ".csv"
        result = dal.execute_query(src_filename)
        result.to_csv(path_or_buf=dest_filename,index=False)

execute_query is a method stored in another file:

def execute_query(self, source_path):
    conn_rs = psycopg2.connect(self.conn_string)
    cursor = conn_rs.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
    sql_file = self.read_sql_file(source_path)
    cursor.execute(sql_file)
    records = cursor.fetchall()
    conn_rs.commit()
    return pd.DataFrame(data=records)

def read_sql_file(self, path):
    sql_path = path
    f = open(sql_path, 'r')
    return f.read()

I have a couple queries that take around 15 minutes to execute (not unusual given the size of the data in our Redshift cluster), and they execute just fine in SQL Workbench. I can see in the AWS Console that the query has completed, but the script just hangs and doesn't dump the results to a csv file, nor does it proceed to the next file in the folder.

I don't have any timeouts specified. Is there anything else I'm missing?

Python script hangs when executing long running query, even after query completes

Answers (1)

Related Questions