Reputation: 559
I have a script which converts a date stored in a database from Unix Time (epoch) to human readable format. There are 30,000 records.
To pull the data from the database, convert it, and print it to the screen is very fast. However, to pull the data from the database, convert it and execute an "update" statement to update the record, is extremely slow.
Is there anyway to optimise the following code to speed this process up for the 30,000 records i have?
cur.execute('select Atime from Hash where Atime like (?) ', (test,))
results = cur.fetchall()
for row in results:
convertedtime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime((float(row[0]))))
print convertedtime
cur.execute('Update Hash set Atime = (?) where Atime = (?)', (convertedtime, row[0]))
con.commit()
The con.commit()
is outside of the for loop so it is not an issue with committing records after every iteration.
Upvotes: 3
Views: 1135
Reputation: 164689
The major performance problem is you're pulling the data out of SQLite, loading it into Python, converting it in Python, then putting it back into the database one date at a time. This is never going to be efficient.
Instead, use SQLite's own built in date and time functions. It looks like atime
is Unix epoch time.
update hash set atime = datetime(atime, 'unixepoch', 'localtime');
But you probably don't want to store dates in the local time zone. Time zones get complicated, and there's daylight savings time which has missing and overlapping times... it just leads to tears. You definitely don't want to store a datetime in the local time zone without indicating what that time zone is!
Unless you have a really good reason, store it as UTC.
update hash set atime = datetime(atime, 'unixepoch');
In general, if you want to do things SQLite does not support, create a user-defined function and use it in a query. This will be less efficient than using built in SQLite functions, but more efficient than selecting, converting, and updating.
It would look something like this.
def epoch_to_iso8601(epoch):
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime((float(epoch))))
con.create_function("epoch_to_iso8601", 1, epoch_to_iso8601)
Then you can use epoch_to_iso8601
in a query.
update hash set atime = epoch_to_iso8601(atime);
Note that this is not the same as a stored procedure. Because there is no SQLite server, all the code is running in your process, this function is per process.
The real problem here is you're storing datetimes as strings. This makes them slow and awkward to work with. It means you have to choose a single formatting. It means you have to parse that format to do anything with it. It means you can't use the built in SQLite date and time functions (sparse as they are).
What you actually want to do is leave atime
as a Unix epoch time and format it as you need per query.
select datetime(atime, 'unixepoch') from hash;
Fortunately SQLite is very loosey-goosey with its types and will convert the text atime
field to a number for you, though it will incur a performance and storage penalty.
Ideally you'd want to change atime
to use the datetime
type, but this is difficult in SQLite. It doesn't support removing or modifying existing columns. Instead you'd have to dump the data in the table, recreate the table, and import the data. This should be very fast with just 30,000 records.
Switch to CSV mode, send the output to a file, and select everything.
sqlite> .mode csv hash
sqlite> .output hash.out
sqlite> select * from hash;
Drop the existing table and recreate it the same but with atime
as a datetime
.
sqlite> drop table hash;
sqlite> create table hash ( atime datetime, and the other columns );
Import the dump.
sqlite> .import hash.out hash
Upvotes: 4