a coder
a coder

Reputation: 7639

GROUP BY HAVING not working as expected

I'm struggling with what should be a simple query.

An event table stores user activity in an application. Each click generates a new event and datetime stamp. I need to show a list of recently accessed records having the most recent datetime stamp. I need to only show the past 7 days of activity.

The table has an auto-increment field (eventID), which corresponds with the date_event field, so it's better to use that for determining the most recent record in the group.

I found that some records are not appearing in my results with the expected most recent datetime. So I stripped my query down the basics:

NOTE that the real-life query does not look at custID. I am including it here to narrow down on the problem.

        SELECT
            el.eventID,
            el.custID,
            el.date_event
        FROM  
            event_log el
        WHERE 
            el.custID = 12345 AND
            el.userID=987
        GROUP BY  
            el.custID
        HAVING
            MAX( el.eventID )

This is returned:

eventID     custID  date_event
346290      12345   2013-06-21 09:58:44

Here's the EXPLAIN

id  select_type     table   type    possible_keys               key     key_len     ref     rows    Extra
1   SIMPLE          el      ref     userID,custID,Composite     custID  5           const   203     Using where

If I change the query to use HAVING MIN, the results don't change.. I should see a different eventID and date_event, as there are dozens of records matching the custID and userID.

        SELECT
            el.eventID,
            el.custID,
            el.date_event
        FROM  
            event_log el
        WHERE 
            el.custID = 12345 AND
            el.userID=987
        GROUP BY  
            el.custID
        HAVING
            MIN( el.eventID )

Same results as before:

eventID     custID  date_event
346290      12345   2013-06-21 09:58:44

No change.

This tells me I have another problem, but I am not seeing what that might be.

Some pointers would be appreciated.

Upvotes: 1

Views: 1123

Answers (2)

leonardo_assumpcao
leonardo_assumpcao

Reputation: 686

You can use a group function in a statement containing no GROUP BY clause, but it would be equivalent to grouping on all rows. But I guess you're looking for the common syntax,

SELECT
  MIN(el.eventID) AS `min_eventID`, --> Yes it is wrong :(
  el.custID,
  el.date_event
FROM  
  event_log el
WHERE 
  el.userID = 987
GROUP BY el.custID;

But disagreements are welcome .


[ Edit ]

I think I didn't show a solution fast enough... but maybe you're rather looking for the fastest solution.
Assuming field date_event defaults to CURRENT_TIMESTAMP (am I wrong?), ordering by date_event would be a waste of time (and money, thus).
I've made some tests with 20K rows and execution time was about 5ms.

SELECT STRAIGHT_JOIN y.*
FROM ((
  SELECT MAX(eventId) as eventId
  FROM event_log
  WHERE userId = 987 AND custId = 12345
)) AS x
INNER JOIN event_log AS y
  USING (eventId);

Maybe (possibly, who knows) you didn't get the straight_join thing; as documented on the scriptures, STRAIGHT_JOINs are similar to JOINs, except that the left table is always read before the right table. Sometimes it's useful.
For your specific situation, we're likely to filter to a certain eventID before (on table "x"), not to retrieve 99,99% useless rows from table "y".

  • More disagreements expected in 3, 2, ...

Upvotes: 0

Barmar
Barmar

Reputation: 780655

SELECT
    el.eventID,
    el.custID,
    el.date_event
FROM  
    event_log el
WHERE 
    el.custID = 12345 AND
    el.userID=987 AND
    el.eventID IN (SELECT MAX(eventID)
                   FROM event_log
                   WHERE custID = 12345
                   AND userID = 987)

Your query doesn't work because you misunderstand what HAVING does. It evaluates the expression on each line of the result set, and keeps the rows where the expression evaluates to true. The expression MAX(el.eventID) simply returns the maximum event ID selected by the query, it doesn't compare the current row to that event ID.

Another way is:

SELECT
    el.eventID,
    el.custID,
    el.date_event
FROM  
    event_log el
WHERE 
    el.custID = 12345 AND
    el.userID=987
ORDER BY eventID DESC
LIMIT 1

The more general form that works for multiple custID is:

SELECT el.*
FROM event_log el
JOIN (SELECT custID, max(date_event) maxdate
      FROM event_log
      WHERE userID = 987
      GROUP BY custID) emax
ON el.custID = emax.custID AND el.date_event = emax.maxdate
WHERE el.userID = 987

Upvotes: 3

Related Questions