Ebrahim Ghasemi
Ebrahim Ghasemi

Reputation: 6126

How to extract correct data from Sqlite database using Python?

I have a database of people names and their birthdays. The format of birthday is mm/dd/yyyy, like "3/13/1960".

I want to extract a list of people who are born after a specific date. I called this date "base".

The program that you see below, firstly creates a DB of people (to simulate the real DB that I want to work with), and then extracts the required list. The problem is that the result is not as I expect:

import datetime as dt
import peewee as pw
db = pw.SqliteDatabase('people1.db')

class Person(pw.Model):
    name = pw.CharField()
    birthday = pw.DateField(formats=['%m/%d/%Y'])
    class Meta:
        database = db # This model uses the "people.db" database.

db.create_tables([Person])


bob0 = Person(name='Bob0', birthday='4/13/1940')
bob1 = Person(name='Bob1', birthday='5/13/1950')
bob2 = Person(name='Bob2', birthday='3/13/1960')
bob3 = Person(name='Bob3', birthday='3/13/1970')
bob4 = Person(name='Bob4', birthday='3/13/1980')
bob5 = Person(name='Bob5', birthday='3/13/1990')

base = Person(name="base", birthday='3/13/1960')

bob0.save()
bob1.save()
bob2.save()
bob3.save()
bob4.save()
bob5.save()
base.save()

for item in Person.select().where(Person.birthday > base.birthday):
    print item.name , item.birthday

Output:

>>> ================================ RESTART ================================
>>> 
Bob0 1940-04-13
Bob1 1950-05-13
Bob3 1970-03-13
Bob4 1980-03-13
Bob5 1990-03-13
>>> 

As you see above, base = 3/13/1960. So I shouldn't have Bob0 and Bob1 in output! How can I handle it?

Note that, I don't want to change the format of birthdays in the database. I also don't want to fetch all the rows and check them later! I want to fetch the required rows only.

Upvotes: 4

Views: 1489

Answers (4)

Nizam Mohamed
Nizam Mohamed

Reputation: 9240

Just some chaining of SQL functions looks funny but works, maybe faster.

from datetime import datetime

import peewee as pw

db = pw.SqliteDatabase('people1.db')

class Person(pw.Model):
    name = pw.CharField()
    birthday = pw.DateField(formats=['%m/%d/%Y'])
    class Meta:
        database = db # This model uses the "people.db" database.

db.create_tables([Person])


bob0 = Person(name='Bob0', birthday='4/13/1940')
bob1 = Person(name='Bob1', birthday='5/13/1950')
bob2 = Person(name='Bob2', birthday='3/13/1960')
bob3 = Person(name='Bob3', birthday='3/13/1970')
bob4 = Person(name='Bob4', birthday='3/13/1980')
bob5 = Person(name='Bob5', birthday='3/13/1990')
bob6 = Person(name='Bob6', birthday='12/1/1990')

base = Person(name="base", birthday='3/13/1960')

bob0.save()
bob1.save()
bob2.save()
bob3.save()
bob4.save()
bob5.save()
bob6.save()
base.save()

month = 'substr(birthday,1,instr(birthday,"/")-1)'
iso_month = 'case when length({month}) = 1 then "0" || {month} else {month} end'.format(month=month)
day = 'trim(trim(birthday,"0123456789"),"/")'
iso_day = 'case when length({day}) = 1 then "0" || {day} else {day} end'.format(day=day)
year = 'substr(ltrim(ltrim(birthday,"0123456789"),"/"),instr(ltrim(ltrim(birthday,"0123456789"),"/"),"/")+1)'
iso_date = 'replace(replace(replace("yyyy-mm-dd","yyyy",{year}),"mm",{iso_month}),"dd",{iso_day})'.format(year=year,iso_month=iso_month,iso_day=iso_day)
iso_base = datetime.strptime(base.birthday,'%m/%d/%Y').date().isoformat()

if __name__ == '__main__':

    for item in Person.select().where(pw.SQL(iso_date) > iso_base):
        print item.name , item.birthday
#output
#Bob3 1970-03-13
#Bob4 1980-03-13
#Bob5 1990-03-13
#Bob6 1990-12-01

Upvotes: 0

coleifer
coleifer

Reputation: 26225

SQLite stores dates as strings. For this reason they should be stored as YYYY-MM-DD, this ensures they will be sorted correctly. There is no reason to do otherwise, honestly.

If you look at sqlite's docs, it does not even recognize dates in the format you're using:

https://www.sqlite.org/lang_datefunc.html

So, my advice is to update the way you're storing dates.

Otherwise, create a user-defined function that does the correct thing using strptime (assumes the use of playhouse.sqlite_ext.SqliteExtDatabase):

@db.func()
def fix_time(s):
    return datetime.datetime.strptime(s, '%m/%d/%Y').strftime('%Y-%m-%d')

If you want to stick with regular old SqliteDatabase, you can call the sqlite3 method connection.create_function.

Upvotes: -1

warvariuc
warvariuc

Reputation: 59674

SQlite stores date-times as strings. So as other have suggested in comments and other answers you should use a different format for storing dates, so that "date ordering and lexical ordering work out the same":

import datetime as dt
import peewee as pw

db = pw.SqliteDatabase('people1.db')

class Person(pw.Model):
    name = pw.CharField()
    birthday = pw.DateField(formats=['%Y-%m-%d'])
    class Meta:
        database = db # This model uses the "people.db" database.

db.create_tables([Person])


Person.create(name='Bob0', birthday=dt.date(1940, 4, 13))
Person.create(name='Bob1', birthday=dt.date(1950, 5, 13))
Person.create(name='Bob2', birthday=dt.date(1960, 3, 13))
Person.create(name='Bob3', birthday=dt.date(1970, 3, 13))
Person.create(name='Bob4', birthday=dt.date(1980, 3, 13))
Person.create(name='Bob5', birthday=dt.date(1990, 3, 13))

base = Person.create(name="base", birthday=dt.date(1960, 3, 13))

for item in Person.select().where(Person.birthday > base.birthday):
    print item.name , item.birthday

This gives:

Bob3 1970-03-13
Bob4 1980-03-13
Bob5 1990-03-13

UPDATE

I haven't noticed your comment that you don't want to change the database.

Here is a crazy way to extract parts of the date:

SELECT
    birthday,
    CAST(substr(birthday, 1, instr(birthday, '/') - 1) AS integer),
    CAST(substr(substr(birthday, instr(birthday, '/') + 1), 1, instr(substr(birthday, instr(birthday, '/') + 1), '/') - 1) AS integer),
    CAST(substr(birthday, instr(birthday, '/') + instr(substr(birthday, instr(birthday, '/') + 1), '/') + 1) AS integer)
FROM person

which on my test data gives:

4/13/1940   4   13  1940
12/13/1950  12  13  1950
3/3/1960    3   3   1960
3/25/1970   3   25  1970
3/13/1980   3   13  1980
3/13/1990   3   13  1990
3/13/1960   3   13  1960

You can use these expressions to compare them with parts of the given date:

query = """
SELECT *
FROM person
WHERE
    (
        substr('0000' || CAST(substr(birthday, instr(birthday, '/') + instr(substr(birthday, instr(birthday, '/') + 1), '/') + 1) AS integer), -4, 4) || '-' || -- year
        substr('00' || CAST(substr(birthday, 1, instr(birthday, '/') - 1) AS integer), -2, 2) || '-' || -- month
        substr('00' || CAST(substr(substr(birthday, instr(birthday, '/') + 1), 1, instr(substr(birthday, instr(birthday, '/') + 1), '/') - 1) AS integer), -2, 2) -- day
    ) > '1960-03-03'
"""
for item in Person.raw(query):
    print item.name, item.birthday

I am reconstructing ISO date here and use it for comparison.

Upvotes: 3

Jeremy Allen
Jeremy Allen

Reputation: 6644

You can use sqlite3.Connection.create_function to specify your own sqlite function that will transform your dates into something that can be lexicographically sorted:

import datetime as dt
import peewee as pw

# custom sqlite function to reformat our date string
def _sqlite_reformat_date(unfortunate_date_string):
    return dt.datetime \
        .strptime(unfortunate_date_string,'%m/%d/%Y') \
        .strftime('%Y-%m-%d')

# Subclass pw.SqliteDatabase to add our custom sqlite function
class MySqliteDatabase(pw.SqliteDatabase):
    def __init__(self, *args, **kwargs):
        super(MySqliteDatabase, self).__init__(*args, **kwargs)

    def _add_conn_hooks(self, conn):
        conn.create_function('reformat_date', 1, _sqlite_reformat_date)
        super(MySqliteDatabase, self)._add_conn_hooks(conn)

db = MySqliteDatabase('people1.db')

# ...
# Your model definition and data inserts from your example above
# ...

rd = pw.fn.reformat_date # Use our custom sqlite function
for item in Person.select().where(rd(Person.birthday) > rd(base.birthday)):
    print item.name , item.birthday

Although this approach will 'fetch the required rows only' it will still run this python function for every row! It is little better than just fetching all the rows in doing the date comparison in python, it may even be slower!

However the _sqlite_reformat_date function could be refactored to be much faster and its nice to know how easy it is to add a custom function into sqlite.

Upvotes: 2

Related Questions