TomNash
TomNash

Reputation: 3288

sqlite3 different result in console compared to Python script

Below is a dummy example to demonstrate how the query results are different, the real query is more more complex so the query structure may seem overkill in this example. Set up a connection to a sqlite3 database and add these records to start with:

import sqlite3

connection = sqlite3.connect(
    'file:test_database',
    detect_types=sqlite3.PARSE_DECLTYPES,
    isolation_level=None,
    check_same_thread=False,
    uri=True
)

cursor = connection.cursor()

tableA_records = [(1, 202003), (2, 202003), (3, 202003), (4, 202004), (5, 202004), (6, 202004), (7, 202004), (8, 202004), ]
tableB_records = [(1, 202004), (2, 202004), (3, 202004), (4, 202004), (5, 202004),]

tableA_ddl = """
    create table tableA
    (
        ID           int,
        RunYearMonth int
    );
"""

tableB_ddl = """
    create table tableB
    (
        ID           int,
        RunYearMonth int
    );
"""

cursor.execute(tableA_ddl)
cursor.execute(tableB_ddl)

cursor.executemany("INSERT INTO tableA VALUES (?, ?)", tableA_records)
cursor.executemany("INSERT INTO tableB VALUES (?, ?)", tableB_records)

Now we have two tables (A and B) with 8 and 5 records, respectively. I want to count records that have the same ID and date between the two when the date is 202004.

I have this query now:

SELECT COUNT(*)
    FROM (
        SELECT *
        FROM `tableA`
        WHERE `RunYearMonth` = 202004
    ) AS `A`
    INNER JOIN (
        SELECT *
        FROM `tableB`
        WHERE `RunYearMonth` = 202004
    ) AS `B`
      ON `A`.`ID` = `B`.`ID`
      AND `A`.`RunYearMonth` = `B`.`RunYearMonth`

This, as expected, returns 2 when run in a sqlite console.

When run in Python, though, you get a different result.

q = """
SELECT COUNT(*)
    FROM (
        SELECT *
        FROM `tableA`
        WHERE `RunYearMonth` = 202004
    ) AS `map1`
    INNER JOIN (
        SELECT *
        FROM `tableB`
        WHERE `RunYearMonth` = 202004
    ) AS `map2`
      ON `map1`.`ID` = `map2`.`ID`
      AND `map1`.`RunYearMonth` = `map2`.`RunYearMonth`
"""
cursor.execute(q)
print(cursor.fetchall())

This returns instead 5 which effectively ignores the WHERE clauses in the subqueries and the join condition they have the same RunYearMonth, there are records 1-5 in both.

What would cause this difference? Does Python not simply pass the query string through?

Pertinent versions:

sqlite3.version == 2.6.0
sqlite3.sqlite_version == 3.31.1
sys.version == 3.6.5

Upvotes: 0

Views: 202

Answers (1)

Shawn
Shawn

Reputation: 52374

I created a test database using your first script, and then opened it in a sqlite3 shell. Your query returns 5 in it, not the 2 you're getting. After changing it to show all the rows, not just the count, it results in:

ID          RunYearMonth  ID          RunYearMonth
----------  ------------  ----------  ------------
1           202003        1           202004
2           202003        2           202004
3           202003        3           202004
4           202004        4           202004
5           202004        5           202004

I'm not sure why those rows from tableA with RunYearMonth of 202003 are getting included; I'd think they would be filtered out by the subquery's WHERE.

This appears to be a bug in Sqlite3 - using an older version (3.11.0) gives the expected results, and a slight tweak to the query to remove the AND map1.RunYearMonth = map2.RunYearMonth produces the correct results on 3.31.1.


Anyways, that query can be cleaned up significantly, like so:

SELECT count(*)
FROM tableA AS A
JOIN tableB AS B ON A.ID = B.ID
                AND A.RunYearMonth = B.RunYearMonth
WHERE A.RunYearMonth = 202004;

which does return the expected count of 2.

Upvotes: 1

Related Questions