Query Sqlite for multiple arguments at once and handling missing values

Question

Is there the possibility to do something like this inside the SQL-Query? Maybe provide a list as input-argument? The dates I want are consecutive, but not all dates exist in the database. If a date doesn't exist, the result should be "None".

dates = [dt.datetime(2008,1,1), dt.datetime(2008,1,2), dt.datetime(2008,1,3), dt.datetime(2008,1,4), dt.datetime(2008,1,5)]
id = "361-442"
result = []
for date in dates:
    curs.execute('''SELECT price, date FROM prices where date = ? AND id = ?''', (date, id))
    query = curs.fetchall()
    if  query == []:
        result.append([None, arg])
    else:
        result.append(query)

unutbu · Accepted Answer

To perform all the work in sqlite, you could use a LEFT JOIN to fill in missing prices with None:

sql='''
SELECT p.price, t.date
FROM ( {t} ) t
LEFT JOIN price p
ON p.date = t.date
WHERE p.id = ?
'''.format(t=' UNION ALL '.join('SELECT {d!r} date'.format(d=d) for d in date))

cursor.execute(sql,[id])
result=cursor.fetchall()

However, this solution requires forming a (potentially) huge string in Python in order to create a temporary table of all desired dates. It is not only slow (including the time it takes sqlite to create the temporary table) it is also brittle: If len(date) is greater than about 500, then sqlite raises

OperationalError: too many terms in compound SELECT

You might be able to get around this if you already have all the desired dates in some other table. Then you could replace the ugly "UNION ALL" SQL above with something like

SELECT p.price, t.date
FROM ( SELECT date from dates ) t
LEFT JOIN price p
ON p.date = t.date

While this is an improvement, my timeit tests (see below) show that doing part of the work in Python is still faster:

Doing part of the work in Python:

If you know that the dates are consecutive and can therefore be expressed as a range, then:

curs.execute('''
    SELECT date, price
    FROM prices
    WHERE date <= ?
        AND date >= ?
        AND id = ?''', (max(date), min(date), id))

Otherwise, if the dates are arbitrary then:

sql = '''
    SELECT date, price
    FROM prices
    WHERE date IN ({s})
        AND id = ?'''.format(s={','.join(['?']*len(dates))})
curs.execute(sql,dates + [id])

To form the result list with None inserted for missing prices, you could form a dict out of (date,price) pairs, and use the dict.get() method to supply the default value None when the date key is missing:

result = dict(curs.fetchall())
result = [(result.get(d,None), d) for d in date]

Note to form the dict as a mapping from dates to prices, I swapped the order of date and price in the SQL queries.

Timeit tests:

I compared these three functions:

def using_sqlite_union():
    sql = '''
        SELECT p.price, t.date
        FROM ( {t} ) t
        LEFT JOIN price p
        ON p.date = t.date
    '''.format(t = ' UNION ALL '.join('SELECT {d!r} date'.format(d = str(d))
                                      for d in dates))
    cursor.execute(sql)
    return cursor.fetchall()

def using_sqlite_dates():
    sql = '''
        SELECT p.price, t.date
        FROM ( SELECT date from dates ) t
        LEFT JOIN price p
        ON p.date = t.date
    '''
    cursor.execute(sql)
    return cursor.fetchall()

def using_python_dict():
    cursor.execute('''
        SELECT date, price
        FROM price
        WHERE date <= ?
            AND date >= ?
            ''', (max(dates), min(dates)))

    result = dict(cursor.fetchall())
    result = [(result.get(d,None), d) for d in dates]
    return result

N = 500
m = 10
omit = random.sample(range(N), m)
dates = [ datetime.date(2000, 1, 1)+datetime.timedelta(days = i) for i in range(N) ]
rows = [ (d, random.random()) for i, d in enumerate(dates) if i not in omit ]

rows defined the data which was INSERTed into the price table.

Timeit tests results:

Running timeit like this:

python -mtimeit -s'import timeit_sqlite_union as t' 't.using_python_dict()'

produced these benchmarks:

·────────────────────·────────────────────·
│  using_python_dict │ 1.47 msec per loop │
│ using_sqlite_dates │ 3.39 msec per loop │
│ using_sqlite_union │ 5.69 msec per loop │
·────────────────────·────────────────────·

using_python_dict is about 2.3 times faster than using_sqlite_dates. Even if we increase the total number of dates to 10000, the speed ratio remains the same:

·────────────────────·────────────────────·
│  using_python_dict │ 32.5 msec per loop │
│ using_sqlite_dates │ 81.5 msec per loop │
·────────────────────·────────────────────·

Conclusion: shifting all the work into sqlite is not necessarily faster.

Query Sqlite for multiple arguments at once and handling missing values

Answers (1)

Related Questions