Reputation: 354
I'm attempting to update around 500k rows in a SQLite database. I can create them rather quickly, but when I'm updating, it seems to be indefinitely hung, but I don't get an error message. (An insert of the same size took 35 seconds, this update has been at it for over 12 hours).
The portion of my code that does the updating is:
for line in result:
if --- blah blah blah ---:
stuff
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
It prints "Executing SQL..." and presumably attempts the executemany and that's where its stuck. The variable "result" is a list of records and is working as far as I can tell because the insert statement is working and it is basically the same.
Am I misusing executemany? I see many threads on executemany(), but all of them as far as I can tell are getting an error message, not just hanging indefinitely.
For reference, the full code I have is below. Basically I'm trying to convert an ASCII file to a sqlite database. I know I could technically insert all columns at the same time, but the machines I have access to are all limited to 32bit Python and they run out of memory (this file is quite large, close to 1GB of text).
import pandas as pd
import sqlite3
ascii_file = r'c:\Path\to\file.ASC_'
sqlite_file = r'c:\path\to\sqlite.db'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
# Taken from https://www.cdc.gov/brfss/annual_data/2015/llcp_varlayout_15_onecolumn.html
raw_list = [[1,"_STATE",2],
[17,"FMONTH",2],
... many other values here
[2154,"_AIDTST3",1],]
col_list = []
for col in raw_list:
begin = (col[0] - 1)
col_name = col[1]
end = (begin + col[2])
col_list.append([(begin, end,), col_name,])
for col in col_list:
print(col)
col_specification = [col[0]]
print("Parsing...")
data = pd.read_fwf(ascii_file, colspecs=col_specification)
print("Done")
result = data.iloc[:,[0]]
result = result.values.flatten()
sql = '''CREATE table if not exists BRFSS2015
(row_id integer NOT NULL,
''' + col[1] + ' text)'
print(sql)
c.execute(sql)
conn.commit()
sql = '''ALTER TABLE
BRFSS2015 ADD COLUMN ''' + col[1] + ' text'
try:
c.execute(sql)
print(sql)
conn.commit()
except Exception as e:
print("Error Happened instead")
print(e)
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
if '_STATE' in col:
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
sql = 'INSERT into BRFSS2015 (row_id,' + col[1] + ')'\
+ 'values (?,?)'
c.executemany(sql, result_list)
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
print("Comitted... moving on to next column...")
Upvotes: 1
Views: 869
Reputation: 180070
For each row to be updated, the database has to search for that row. (This is not necessary when inserting.) If there is no index on the row_id
column, then the database has to go through the entire table for each update.
It would be a better idea to insert entire rows at once. If that is not possible, create an index on row_id
, or better, declare it as INTEGER PRIMARY KEY.
Upvotes: 2