Scott
Scott

Reputation: 121

Why does SQLite return the wrong value from a subquery?

Given a schema and data in SQLite 3.7.17 (I'm stuck with this version):

CREATE TABLE reservations (id INTEGER NOT NULL PRIMARY KEY,NodeID INTEGER,ifIndex INTEGER,dt TEXT,action TEXT,user TEXT,p TEXT);
INSERT INTO "reservations" VALUES(1,584,436211200,'2022-03-12 10:10:00','R','s','x');
INSERT INTO "reservations" VALUES(2,584,436211200,'2022-03-12 10:10:01','R','s','x');
INSERT INTO "reservations" VALUES(3,584,436211200,'2022-03-12 10:10:05','U','s','x');
INSERT INTO "reservations" VALUES(4,584,436211200,'2022-03-12 10:09:01','R','s','x');

I'm trying to get the most recent action for each pair of (NodeID,ifIndex).

Running SELECT MAX(dt),action FROM reservations GROUP BY NodeId,ifIndex; I get:

MAX(dt)|action
2022-03-12 10:10:05|U

Perfect. Now I want to select just the action from this query (dropping the MAX(dt)): SELECT t.action FROM (SELECT MAX(dt),action FROM reservations GROUP BY NodeId,ifIndex) AS t;:

t.action
R

This I don't understand. Also: SELECT t.* FROM (SELECT MAX(dt),action FROM reservations GROUP BY NodeId,ifIndex) AS t;:

MAX(dt)|action
2022-03-12 10:10:05|U

gives the correct value. So why does the query not seem to be querying against the subquery? Perhaps it's a bug in this version of SQLite as SQLFiddle works fine (http://sqlfiddle.com/#!7/f7619a/4)

In attempt to workaround this issue I use this query: SELECT t2.action FROM (SELECT MAX(dt),* FROM reservations GROUP BY NodeId,ifIndex) AS t1 INNER JOIN reservations AS t2 on t1.id = t2.i which seems to work:

action
U

Upvotes: 1

Views: 189

Answers (1)

Thorsten Kettner
Thorsten Kettner

Reputation: 94859

You are right, this seems to be a bug in your SQLite version.

To get into more detail, you are using SQLite's GROUP BY extension "Bare columns in an aggregate query".

In standard SQL and almost all RDBMS your query

SELECT MAX(dt), action FROM reservations GROUP BY NodeId, ifIndex;

is invalid. Why is that? You group by NodeId and ifIndex, thus aggregating your data down to one result row per NodeId and ifIndex. In each such row you want to show the group's maximum date and the group's action. But while there is one maximum date for a group, there is no one action for it, but several. Your query is considered invalid in standard SQL, because you don't tell the DBMS which of the group's actions you want to see. This could be the minimum action for example (i.e. the first in alphabetical order). That means there must be an aggregation function invoked on that column.

Not so in SQLite. When SQLite finds a "bare column" in a GROUP BY query that is meant to find a MAX or MIN of a column, it considers this to mean to take the bare column's value from the row where the minimum or maximum is found in. This is an extension to the SQL standard, and SQLite is the only DBMS I know of to feature this. You can read about this in the SQLite docs: Search "Bare columns in an aggregate query" in https://www.sqlite.org/lang_select.html#resultset.

SELECT MAX(dt), action FROM reservations GROUP BY NodeId, ifIndex;

hence finds the action in the row with the maximum dt. If you selected MIN(dt) instead, it would get you the action of the row with the minimum dt.

And of course a query selecting from a subquery result should still get the same value. It seems, however, that in your version SQLite gets confused with its bare column detection. It doesn't throw an error telling you it doesn't know which action to select, but it doesn't select the maximum dt's action either. Obviously a bug.

In standard SQL (and almost any RDBMS) your original query would be written like this:

SELECT dt, action 
FROM reservations r
WHERE dt =
(
  SELECT MAX(dt)
  FROM reservations mr
  WHERE mr.NodeId = r.NodeId AND mr.ifIndex = r.ifIndex
);

or like this:

SELECT dt, action 
FROM reservations r
WHERE NOT EXISTS
(
  SELECT NULL
  FROM reservations gr
  WHERE gr.NodeId = r.NodeId
  AND gr.ifIndex = r.ifIndex
  AND gr.dt > r.dt
);

or like this:

  SELECT dt, action
  FROM
  (
    SELECT dt, action, MAX(dt) OVER (PARTITION BY NodeId, ifIndex) AS max_dt
    FROM reservations
  ) with_max_dt
  WHERE dt = max_dt;

And there are still other ways to get the top row(s) per group.

In any of these proper SQL queries, you can remove dt from the select list and still get the maximum dt's action.

Upvotes: 1

Related Questions