Reputation: 179
I'm having an issue with inserting rows into a database. Just wondering if anyone has any ideas why this is happening? It works when I avoid using fast_executemany but then inserts become very slow.
driver = 'ODBC Driver 17 for SQL Server'
conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server+ \
';UID=' + user+ ';PWD=' + password)
cursor = conn.cursor()
cursor.fast_executemany = True
insert_sql = """
INSERT INTO table (a, b, c)
VALUES (?, ?, ?)
"""
cursor.executemany(insert_sql, insert_params)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-12-e7e82e4d8c2d> in <module>
2 start_time = time.time()
3
----> 4 cursor.executemany(insert_sql, insert_params)
MemoryError:
Upvotes: 5
Views: 6668
Reputation: 21
In my case, the MemoryError was because I was using a very old driver 'SQL Server'. Switched to the newer driver ('ODBC Driver 17 for SQL Server') as described in the link below and it worked: link
Upvotes: 2
Reputation: 2886
While this issue was solved for the OP by Gord Thompson's answer, I wanted to note that the question as written applies to other cases where a MemoryError
may occur, and fast_executemany
actually can throw that in other circumstances beyond just usage of [N]TEXT
columns.
In my case a MemoryError
was thrown during an attempt to INSERT
several million records at once, and as noted here, "parameter values are held in memory, so very large numbers of records (tens of millions or more) may cause memory issues". It doesn't necessarily require tens of millions to trigger, so YMMV.
An easy solution to this is to identify a sane number of records to batch per each execute. Here's an example if using a Pandas dataframe as a source (establish your insert_query
as usual):
batch_size = 5000 # Set to a desirable batch size
with connection.cursor() as cursor:
try:
cursor.fast_executemany = True
# Iterate each batch chunk using numpy's split
for chunk in np.array_split(df, batch_size):
cursor.executemany(insert_query,
chunk.values.tolist())
# Run a single commit at the end of the transaction
connection.commit()
except Exception as e:
# Rollback on any exception
connection.rollback()
raise e
Hope this helps anyone who hits this issue and doesn't have any [N]TEXT
columns on their target!
Upvotes: 2
Reputation: 123829
There is a known issue with fast_executemany
when working with TEXT
or NTEXT
columns, as described on GitHub here.
The problem is that when pyodbc queries the database metadata to determine the maximum size of the column the driver returns 2 GB (instead of 0, as would be returned for a [n]varchar(max)
column).
pyodbc allocates 2 GB of memory for each [N]TEXT
element in the parameter array, and the Python app quickly runs out of memory.
The workaround is to use cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 0, 0)])
(as described here) to coax pyodbc into treating [N]TEXT
columns like [n]varchar(max)
columns.
(Given that [N]TEXT
is a deprecated column type for SQL Server it is unlikely that there will be a formal fix for this issue.)
Upvotes: 10