ASYNC - Pandas read_sql and asyncio?

Question

Could someone please point me in the right direction on how to solve this following problem. I am trying to come up with a solution using pandas.read_sql and asyncio. I want to migrate table records from 1 database to another database.

I want to do the following:

table 1
.
.
.
table n

I have the function:

def extract(table):
    try:
        df = pd.DataFrame()
        df = pd.concat(
              [chunk for chunk in
                  pd.read_sql(sql,
                              con=CONNECTION,
                              chunksize=10**5)]
                    )
    except Exception as e:
        raise e
    else:
        return df

I want to run these in parallel and not one by one.

extract(table1)
extract(table2)
.
.
extract(tablen)

user4815162342 · Accepted Answer

asyncio is about organizing non-blocking code into callbacks and coroutines. Running CPU-intensive code in parallel is a use case for threads:

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor() as executor:
    frames = list(executor.map(extract, all_tables))

Whether this will actually run faster than sequential code depends on whether pd.read_sql releases the GIL.

ASYNC - Pandas read_sql and asyncio?

Answers (1)

Related Questions