Can the same SQL Server temporary table be persisted and used by multiple independently executed Python scripts without being recreated each time?

Question

I am fetching data from a SQL Server database and saving them to files for consequent processing in Python.

I am using Make to automate fetching and refetching of the data (in case some settings change, only the affected part of the queries is run anew, not all of them). So I have a simple Makefile as follows:

rawdata: datafile1.h5, datafile2.h5 # ... more files like this
datafile1.h5: data1_specific_config.py, common_config.py
    python fetch_data1.py
datafile2.h5: data2_specific_config.py, common_config.py
    python fetch_data2.py
# ... similar rules for other files

and when needed I just run make rawdata.

Now all the SQL queries executed by the scripts fetch_dataN.py have a significant common part. Schematically the queryN which is run by fetch_dataN.py looks like this:

select ... into ##common_tmp_table ... /*this is identical for all queries*/
select ... from (... ##common_tmp_table ...) /*this is queryN specific; but the same ##common_tmp_table is used*/

Here is the problem: when I now run make rawdata in a situation where say five different datafiles need to be rebuilt, then the same query select ... into ##common_tmp_table ... is run five times with the identical output into ##common_tmp_table. The query takes quite a long time to run so re-executing it five times slows everything down significantly.

But the temporary table is always deleted when one script fetch_dataN.py finishes because the db connection which created it is terminated.

Question:

Is there a way how I could force the table ##common_tmp_table to be created only once and persisted between all the scripts fetch_dataN.py that are executed by make rawdata?

In particular, is there a way how to use the same db connection in all the scripts run by make rawdata? Or perhaps to open one extra connection that will persist while all the scripts will be running and which will prevent the global temporary table to be dropped?

Work-around that I know of: I am able to work around this by manually creating the ##common_tmp_table (e.g. in MS SQL Server Management Studio) before running make rawdata and keeping the connection used to this open till all the scripts finish. But this is obviously ugly and annoying.

If make rawdata could open a separate process that would open a connection, create the tmp table and keep waiting until everything else finishes, that would be a solution. But I don't know if this is possible.

Limitations:

I can't make changes in the database (such as creating a permanent table instead of a temporary one)
I need the scripts to stay separate so that they can be executed by make independently (having everything in one script with the same db connection and thus the same tmp table wouldn't help - rebuilding all the datafiles whenever one or two of them need to be re-fetched would be even slower)

Notes:

MS SQL Server 2008 R2
pyodbc 4.0.28 (for connecting to the database)
python 3.7.6
make 4.3
conda 4.7.12

Thanks.

PGlivi · Accepted Answer

So I found a solution which works very nicely: The idea is to let make rawdata execute a python script which

opens a db connection and keeps it open
creates the ##common_tmp_table
runs make rawdata_ which takes care of rebuilding of the datafiles (just the same as make rawdata did in the code posted in the question, but now without select ... into ##common_tmp_table ... in the queries)
closes the connection

In code:

Makefile:

#THIS IS NEW
.PHONY rawdata # to always rebuild rawdata target
rawdata:
    python fetch_all_non_uptodate.py # just call a script that (among other stuff) runs `make rawdata_`

#THE REST IS AS BEFORE (just added underscore)
rawdata_: datafile1.h5, datafile2.h5 # ... more files like this
datafile1.h5: data1_specific_config.py, common_config.py
    python fetch_data1.py
datafile2.h5: data2_specific_config.py, common_config.py
    python fetch_data2.py
# ... similar rules for other files

fetch_all_non_uptodate.py:

import subprocess
import pyodbc

conn = pyodbc.connect(...) #open db connection

# simulate the run of make with the -q flag to find out whether all the datafiles are up-to-date (return code 0) or not (return code 1); nothing is re-fetched as yet
uptodate = (subprocess.run(['make', '-q', 'rawdata_']).returncode == 0)

# if the raw datafiles are not up-to-date
if not uptodate:    
    create_common_tmp_table(conn) # create the ##common_tmp_table in the db and keep it while conn is open
    conn.commit() #commit the creation of the tmp table (Important! - otherwise the other connections won't see it!)
    subprocess.run(['make', 'rawdata_']) # run make to re-fetch whatever datafiles need to be re-fetched
                                         # the queries can make use of the existing tmp table
# otherwise we just simulate the make output telling that all is up-to-date
else:
    print("make: Nothing to be done for 'rawdata'.")

conn.close()

queryN:

/*keep just the specific part - the ##common_tmp_table already exists*/
select ... from (... ##common_tmp_table ...)

Can the same SQL Server temporary table be persisted and used by multiple independently executed Python scripts without being recreated each time?

Answers (1)

Related Questions