Dr.Pepper
Dr.Pepper

Reputation: 559

Python SQLite - Slow UPDATE Records

I have a script which converts a date stored in a database from Unix Time (epoch) to human readable format. There are 30,000 records.
To pull the data from the database, convert it, and print it to the screen is very fast. However, to pull the data from the database, convert it and execute an "update" statement to update the record, is extremely slow.
Is there anyway to optimise the following code to speed this process up for the 30,000 records i have?

    cur.execute('select Atime from Hash where Atime like (?) ', (test,))
    results = cur.fetchall()
    for row in results:
        convertedtime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime((float(row[0]))))
        print convertedtime
        cur.execute('Update Hash set Atime = (?) where Atime = (?)', (convertedtime, row[0]))
    con.commit()

The con.commit()is outside of the for loop so it is not an issue with committing records after every iteration.

Upvotes: 3

Views: 1135

Answers (1)

Schwern
Schwern

Reputation: 164689

The major performance problem is you're pulling the data out of SQLite, loading it into Python, converting it in Python, then putting it back into the database one date at a time. This is never going to be efficient.

Instead, use SQLite's own built in date and time functions. It looks like atime is Unix epoch time.

update hash set atime = datetime(atime, 'unixepoch', 'localtime');

But you probably don't want to store dates in the local time zone. Time zones get complicated, and there's daylight savings time which has missing and overlapping times... it just leads to tears. You definitely don't want to store a datetime in the local time zone without indicating what that time zone is!

Unless you have a really good reason, store it as UTC.

update hash set atime = datetime(atime, 'unixepoch');

In general, if you want to do things SQLite does not support, create a user-defined function and use it in a query. This will be less efficient than using built in SQLite functions, but more efficient than selecting, converting, and updating.

It would look something like this.

def epoch_to_iso8601(epoch):
    return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime((float(epoch))))

con.create_function("epoch_to_iso8601", 1, epoch_to_iso8601)

Then you can use epoch_to_iso8601 in a query.

update hash set atime = epoch_to_iso8601(atime);

Note that this is not the same as a stored procedure. Because there is no SQLite server, all the code is running in your process, this function is per process.

See sqlite3.create_function.


The real problem here is you're storing datetimes as strings. This makes them slow and awkward to work with. It means you have to choose a single formatting. It means you have to parse that format to do anything with it. It means you can't use the built in SQLite date and time functions (sparse as they are).

What you actually want to do is leave atime as a Unix epoch time and format it as you need per query.

select datetime(atime, 'unixepoch') from hash;

Fortunately SQLite is very loosey-goosey with its types and will convert the text atime field to a number for you, though it will incur a performance and storage penalty.


Ideally you'd want to change atime to use the datetime type, but this is difficult in SQLite. It doesn't support removing or modifying existing columns. Instead you'd have to dump the data in the table, recreate the table, and import the data. This should be very fast with just 30,000 records.

Switch to CSV mode, send the output to a file, and select everything.

sqlite> .mode csv hash
sqlite> .output hash.out
sqlite> select * from hash;

Drop the existing table and recreate it the same but with atime as a datetime.

sqlite> drop table hash;
sqlite> create table hash ( atime datetime, and the other columns );

Import the dump.

sqlite> .import hash.out hash

Upvotes: 4

Related Questions