aborted
aborted

Reputation: 4541

How to get my GROUP BY to take the value of the last row (by date) for a specific column?

I have a query which has a GROUP BY clause. This query takes employees "checkin" times from a records table and if the dn field on is 1 then the employee is in the office, but if it's 2, then he/she left.

So lets say I put records.dn in the GROUP BY clause. This results with duplicate entries for each employee (I get their names one after the other in a table, twice).

I tried using MIN and MAX in the select, but that didn't make sense at all.

So from those multiple rows in the database where an employee might checkin and out multiple times, how do I take the LATEST row and use its dn field?

This is the query I'm talking about:

select 
    MAX(records.dn), 
    `records`.`din`, 
    `users`.`username`, 
    `users`.`id` as `user_id`, 
    DATE(records.created_at) AS date, 
    TIME(MIN(records.created_at)) AS first_log, 
    TIME(MAX(records.created_at)) AS last_log 
from `records` 
inner join `users` on `records`.`din` = `users`.`din` 
where records.created_at BETWEEN '2016-09-05 00:00:00' 
    AND '2016-09-08 00:00:00' 
group by `users`.`username`, DATE(records.created_at) 
order by `first_log` asc

(The first select column is what I tried to do, using MAX)

This is the records table with some sample data:

the table

How do I achieve what I'm looking for?

Upvotes: 2

Views: 68

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269953

The substring_index()/group_concat() trick might do exactly what you want:

select substring_index(group_concat(r.dn order by created_at desc), ',', 1) as last_dn, 
       u.username, u.id as `user_id`, 
       DATE(r.created_at) AS date, 
       TIME(MIN(r.created_at)) AS first_log, 
       TIME(MAX(r.created_at)) AS last_log 
from records r inner join
     users u
     on r.din = u.din 
where r.created_at >= '2016-09-05' and
      r.created_at < '2016-09-08' 
group by u.username, DATE(r.created_at) 
order by first_log asc;

As written, this has three limitations:

  1. It assumes that dn doesn't have a comma in it (that is easily fixed by using a different separator).
  2. It assumes that the list of dns for a user for a day won't exceed the maximum length of a string for group_concat(). That can also be adjusted using a system parameter.
  3. The last_dn is an string, even if that is not the original type.

But this trick works well in most cases. The parameter group_concat_max_len is the size of the intermediate string. Its default (described here) is 1,024. This should be fine for up to a hundred or more integers. It is a session parameter and easily changed.

Also note the following changes:

  • I think unnecessary backticks in a query just make it harder to read.
  • The use of table aliases makes a query easier to write and to read.
  • I don't think you mean between, because it is inclusive. I am guessing that you want date/time values strictly before the 8th.

Upvotes: 1

Related Questions