Henry Dang
Henry Dang

Reputation: 179

pyodbc: Memory Error using fast_executemany with TEXT / NTEXT columns

I'm having an issue with inserting rows into a database. Just wondering if anyone has any ideas why this is happening? It works when I avoid using fast_executemany but then inserts become very slow.

driver = 'ODBC Driver 17 for SQL Server'
conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server+ \
                      ';UID=' + user+ ';PWD=' + password)
cursor = conn.cursor()
cursor.fast_executemany = True

insert_sql = """
INSERT INTO table (a, b, c)
VALUES (?, ?, ?)
"""

cursor.executemany(insert_sql, insert_params)

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-12-e7e82e4d8c2d> in <module>
      2 start_time = time.time()
      3 
----> 4 cursor.executemany(insert_sql, insert_params)

MemoryError:

Upvotes: 5

Views: 6668

Answers (3)

longisland21
longisland21

Reputation: 21

In my case, the MemoryError was because I was using a very old driver 'SQL Server'. Switched to the newer driver ('ODBC Driver 17 for SQL Server') as described in the link below and it worked: link

Upvotes: 2

bsplosion
bsplosion

Reputation: 2886

While this issue was solved for the OP by Gord Thompson's answer, I wanted to note that the question as written applies to other cases where a MemoryError may occur, and fast_executemany actually can throw that in other circumstances beyond just usage of [N]TEXT columns.

In my case a MemoryError was thrown during an attempt to INSERT several million records at once, and as noted here, "parameter values are held in memory, so very large numbers of records (tens of millions or more) may cause memory issues". It doesn't necessarily require tens of millions to trigger, so YMMV.

An easy solution to this is to identify a sane number of records to batch per each execute. Here's an example if using a Pandas dataframe as a source (establish your insert_query as usual):

batch_size = 5000  # Set to a desirable batch size
with connection.cursor() as cursor:
    try:
        cursor.fast_executemany = True

        # Iterate each batch chunk using numpy's split
        for chunk in np.array_split(df, batch_size):
            cursor.executemany(insert_query,
                               chunk.values.tolist())

        # Run a single commit at the end of the transaction
        connection.commit()

    except Exception as e:
        # Rollback on any exception
        connection.rollback()
        raise e

Hope this helps anyone who hits this issue and doesn't have any [N]TEXT columns on their target!

Upvotes: 2

Gord Thompson
Gord Thompson

Reputation: 123829

There is a known issue with fast_executemany when working with TEXT or NTEXT columns, as described on GitHub here.

The problem is that when pyodbc queries the database metadata to determine the maximum size of the column the driver returns 2 GB (instead of 0, as would be returned for a [n]varchar(max) column).

pyodbc allocates 2 GB of memory for each [N]TEXT element in the parameter array, and the Python app quickly runs out of memory.

The workaround is to use cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 0, 0)]) (as described here) to coax pyodbc into treating [N]TEXT columns like [n]varchar(max) columns.

(Given that [N]TEXT is a deprecated column type for SQL Server it is unlikely that there will be a formal fix for this issue.)

Upvotes: 10

Related Questions