Reputation: 3756
My database (sqlite3), has two tables with the following schemas:
CREATE TABLE log(d date, usr text, tag text, bytes int);
CREATE TABLE valid(tag text, filesize int);
Let's say I have the following sample data:
d | usr | tag | bytes
--------------------------------
2012-01-19 | bob | foo | 990
2012-01-18 | bob | foo | 1000
2012-01-17 | joe | bar | 2000
2012-01-16 | joe | bar | 1800
2012-01-15 | joe | baz | 0
tag | size
-----------
foo | 1000
bar | 2000
I would like to retrieve a list of the most recent events that resulted in a user failing to access all the bytes in a valid file. In the example above, the select should yield 2012-01-19 | bob | foo | 990.
Right now, I'm using two select statements to get the results. The first gets the most recent event per user, and the second checks whether all the bytes were accessed.
CREATE VIEW tmp AS
SELECT * FROM log JOIN (SELECT max(d) AS maxd, usr FROM log GROUP BY usr) AS
tmplog ON (tmplog.usr=log.usr and tmplog.maxd=log.d);
SELECT usr,d FROM tmp
WHERE tag IN (SELECT tag FROM valid) AND bytes NOT IN (SELECT size FROM valid);
Is there a way to do this with one select or more efficiently?
Update:
In the example above, the query shouldn't retrieve user "joe" since his most recent log entry shows a complete file being accessed.
Upvotes: 3
Views: 3729
Reputation: 11963
SELECT log.* FROM
log
JOIN
(SELECT usr,tag,MAX(d)
FROM log
GROUP BY usr,tag) log2
USING(usr,tag)
JOIN valid ON (log.tag = valid.tag)
WHERE d=`max(d)` AND bytes != valid.filesize
Upvotes: 1
Reputation: 58524
Find the most recent d
per usr
and tag
(where bytes < filesize
), then select those log
entries matching those most recent d/usr/tag
values:
SELECT log.*
FROM log
INNER JOIN ( SELECT usr, tag, MAX(d) AS d
FROM log
GROUP BY 1, 2) most_recent
ON most_recent.usr = log.usr
AND
most_recent.tag = log.tag
AND
most_recent.d = log.d
INNER JOIN valid
ON log.tag = valid.tag
WHERE log.bytes < valid.filesize;
Upvotes: 1
Reputation: 77657
The closest I can suggest so far:
Join log
and valid
to get all the rows where bytes <= size
and to find out MAX(d)
per tag
.
Join the result set back to log
on tag
& d
to get the entire rows pertaining to the corresponding events.
Join again to valid
to get only the rows where bytes < size
. That is necessary to filter out the max dates where the bytes matched the size.
This is how the query might look:
SELECT
log.*
FROM log
INNER JOIN valid ON log.tag = valid.tag AND log.bytes < valid.size
INNER JOIN (
SELECT
log.tag,
MAX(log.d) AS d
FROM log
INNER JOIN valid ON log.tag = valid.tag AND log.bytes <= valid.size
GROUP BY
log.tag
) last
ON log.tag = last.tag AND log.d = last.d
The above assumes that on any date there can only be one event for particular tag. Otherwise the schema presented in the question doesn't allow one to determine which event was the latest one if more than one happen on the same date.
Upvotes: 0
Reputation: 1961
Select
log.*
From log
Inner Join valid On valid.tag = log.tag
Where log.bytes < valid.size Order By log.d Desc
Should work, but doesn't look very efficient to me.
It's slightly different in that it will return multiple rows per user if the same user had more than one issue - do you specifically not want this?
Upvotes: 0