Reputation: 1290
Is there the possibility to do something like this inside the SQL-Query? Maybe provide a list as input-argument? The dates I want are consecutive, but not all dates exist in the database. If a date doesn't exist, the result should be "None".
dates = [dt.datetime(2008,1,1), dt.datetime(2008,1,2), dt.datetime(2008,1,3), dt.datetime(2008,1,4), dt.datetime(2008,1,5)]
id = "361-442"
result = []
for date in dates:
curs.execute('''SELECT price, date FROM prices where date = ? AND id = ?''', (date, id))
query = curs.fetchall()
if query == []:
result.append([None, arg])
else:
result.append(query)
Upvotes: 4
Views: 7103
Reputation: 880219
To perform all the work in sqlite, you could use a LEFT JOIN to fill in missing prices with None
:
sql='''
SELECT p.price, t.date
FROM ( {t} ) t
LEFT JOIN price p
ON p.date = t.date
WHERE p.id = ?
'''.format(t=' UNION ALL '.join('SELECT {d!r} date'.format(d=d) for d in date))
cursor.execute(sql,[id])
result=cursor.fetchall()
However, this solution requires forming a (potentially) huge string in Python in order to create a temporary table of all desired dates. It is not only slow (including the time it takes sqlite to create the temporary table) it is also brittle: If len(date)
is greater than about 500, then sqlite raises
OperationalError: too many terms in compound SELECT
You might be able to get around this if you already have all the desired dates in some other table. Then you could replace the ugly "UNION ALL" SQL above with something like
SELECT p.price, t.date
FROM ( SELECT date from dates ) t
LEFT JOIN price p
ON p.date = t.date
While this is an improvement, my timeit tests (see below) show that doing part of the work in Python is still faster:
Doing part of the work in Python:
If you know that the dates are consecutive and can therefore be expressed as a range, then:
curs.execute('''
SELECT date, price
FROM prices
WHERE date <= ?
AND date >= ?
AND id = ?''', (max(date), min(date), id))
Otherwise, if the dates are arbitrary then:
sql = '''
SELECT date, price
FROM prices
WHERE date IN ({s})
AND id = ?'''.format(s={','.join(['?']*len(dates))})
curs.execute(sql,dates + [id])
To form the result
list with None
inserted for missing prices, you could form a dict
out of (date,price)
pairs, and use the dict.get()
method to
supply the default value None
when the date
key is missing:
result = dict(curs.fetchall())
result = [(result.get(d,None), d) for d in date]
Note to form the dict
as a mapping from dates to prices, I swapped the order of date
and price
in the SQL queries.
Timeit tests:
I compared these three functions:
def using_sqlite_union():
sql = '''
SELECT p.price, t.date
FROM ( {t} ) t
LEFT JOIN price p
ON p.date = t.date
'''.format(t = ' UNION ALL '.join('SELECT {d!r} date'.format(d = str(d))
for d in dates))
cursor.execute(sql)
return cursor.fetchall()
def using_sqlite_dates():
sql = '''
SELECT p.price, t.date
FROM ( SELECT date from dates ) t
LEFT JOIN price p
ON p.date = t.date
'''
cursor.execute(sql)
return cursor.fetchall()
def using_python_dict():
cursor.execute('''
SELECT date, price
FROM price
WHERE date <= ?
AND date >= ?
''', (max(dates), min(dates)))
result = dict(cursor.fetchall())
result = [(result.get(d,None), d) for d in dates]
return result
N = 500
m = 10
omit = random.sample(range(N), m)
dates = [ datetime.date(2000, 1, 1)+datetime.timedelta(days = i) for i in range(N) ]
rows = [ (d, random.random()) for i, d in enumerate(dates) if i not in omit ]
rows
defined the data which was INSERTed into the price
table.
Timeit tests results:
Running timeit like this:
python -mtimeit -s'import timeit_sqlite_union as t' 't.using_python_dict()'
produced these benchmarks:
·────────────────────·────────────────────·
│ using_python_dict │ 1.47 msec per loop │
│ using_sqlite_dates │ 3.39 msec per loop │
│ using_sqlite_union │ 5.69 msec per loop │
·────────────────────·────────────────────·
using_python_dict
is about 2.3 times faster than using_sqlite_dates
. Even if we increase the total number of dates to 10000, the speed ratio remains the same:
·────────────────────·────────────────────·
│ using_python_dict │ 32.5 msec per loop │
│ using_sqlite_dates │ 81.5 msec per loop │
·────────────────────·────────────────────·
Conclusion: shifting all the work into sqlite is not necessarily faster.
Upvotes: 5