Ravi
Ravi

Reputation: 3756

SQL: select most recent event of certain type

My database (sqlite3), has two tables with the following schemas:

CREATE TABLE log(d date, usr text, tag text, bytes int);
CREATE TABLE valid(tag text, filesize int);

Let's say I have the following sample data:

      d    | usr  |  tag | bytes
--------------------------------
2012-01-19 | bob  |  foo | 990
2012-01-18 | bob  |  foo | 1000
2012-01-17 | joe  |  bar | 2000
2012-01-16 | joe  |  bar | 1800
2012-01-15 | joe  |  baz | 0


tag  | size
-----------
foo  | 1000
bar  | 2000

I would like to retrieve a list of the most recent events that resulted in a user failing to access all the bytes in a valid file. In the example above, the select should yield 2012-01-19 | bob | foo | 990.

Right now, I'm using two select statements to get the results. The first gets the most recent event per user, and the second checks whether all the bytes were accessed.

CREATE VIEW tmp AS 
SELECT * FROM log JOIN (SELECT max(d) AS maxd, usr FROM log GROUP BY usr) AS 
tmplog ON (tmplog.usr=log.usr and tmplog.maxd=log.d);

SELECT usr,d FROM tmp 
WHERE tag IN (SELECT tag FROM valid) AND bytes NOT IN (SELECT size FROM valid);

Is there a way to do this with one select or more efficiently?

Update:

In the example above, the query shouldn't retrieve user "joe" since his most recent log entry shows a complete file being accessed.

Upvotes: 3

Views: 3729

Answers (4)

gcbenison
gcbenison

Reputation: 11963

SELECT log.* FROM
  log
  JOIN
  (SELECT usr,tag,MAX(d)
   FROM log
   GROUP BY usr,tag) log2
  USING(usr,tag)
  JOIN valid ON (log.tag = valid.tag)
  WHERE d=`max(d)` AND bytes != valid.filesize

Upvotes: 1

pilcrow
pilcrow

Reputation: 58524

Find the most recent d per usr and tag (where bytes < filesize), then select those log entries matching those most recent d/usr/tag values:

    SELECT log.*
      FROM log
INNER JOIN (    SELECT usr, tag, MAX(d) AS d
                  FROM log
              GROUP BY 1, 2) most_recent
        ON most_recent.usr = log.usr
            AND
           most_recent.tag = log.tag
            AND
           most_recent.d = log.d
INNER JOIN valid
        ON log.tag = valid.tag
     WHERE log.bytes < valid.filesize;

Upvotes: 1

Andriy M
Andriy M

Reputation: 77657

The closest I can suggest so far:

  1. Join log and valid to get all the rows where bytes <= size and to find out MAX(d) per tag.

  2. Join the result set back to log on tag & d to get the entire rows pertaining to the corresponding events.

  3. Join again to valid to get only the rows where bytes < size. That is necessary to filter out the max dates where the bytes matched the size.

This is how the query might look:

SELECT
  log.*
FROM log
  INNER JOIN valid ON log.tag = valid.tag AND log.bytes < valid.size
  INNER JOIN (
    SELECT
      log.tag,
      MAX(log.d) AS d
    FROM log
      INNER JOIN valid ON log.tag = valid.tag AND log.bytes <= valid.size
    GROUP BY
      log.tag
  ) last
    ON log.tag = last.tag AND log.d = last.d

The above assumes that on any date there can only be one event for particular tag. Otherwise the schema presented in the question doesn't allow one to determine which event was the latest one if more than one happen on the same date.

Upvotes: 0

p.g.l.hall
p.g.l.hall

Reputation: 1961

Select
log.*
From log
Inner Join valid On valid.tag = log.tag
Where log.bytes < valid.size Order By log.d Desc

Should work, but doesn't look very efficient to me.

It's slightly different in that it will return multiple rows per user if the same user had more than one issue - do you specifically not want this?

Upvotes: 0

Related Questions