Twelfth
Twelfth

Reputation: 7180

Top 20 percent by id - MySQL

I am using a modified version of a query similiar to another question here:Convert SQL Server query to MySQL

Select *
from
(
SELECT tbl.*, @counter := @counter +1 counter
FROM (select @counter:=0) initvar, tbl
Where client_id = 55
ORDER BY ordcolumn
) X
where counter >= (80/100 * @counter);
ORDER BY ordcolumn

tbl.* contains the field 'client_id' and I am attempting to get the top 20% of the records for each client_id in a single statement. Right now if I feed it a single client_id in the where statement it gives me the correct results, however if I feed it multiple client_id's it simply takes the top 20% of the combined recordset instead of doing each client_id individually.

I'm aware of how to do this in most databases, but the logic in MySQL is eluding me. I get the feeling it involves some ranking and partitioning.

Sample data is pretty straight forward.

Client_id  rate    
1          1
1          2
1          3
(etc to rate = 100)
2          1
2          2
2          3
(etc to rate = 100)

Actual values aren't that clean, but it works.

As an added bonus...there is also a date field associated to these records and 1 to 100 exists for this client for multiple dates. I need to grab the top 20% of records for each client_id, year(date),month(date)

Upvotes: 2

Views: 211

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You need to do the enumeration for each client:

SELECT *
FROM (SELECT tbl.*, @counter := @counter +1 counter
             (@rn := if(@c = client_id, @rn + 1,
                        if(@c := client_id, 1, 1)
                       )
             )
      FROM (select @c := -1, @rn := 0) initvar CROSS JOIN tbl
      ORDER BY client_id, ordcolumn
     ) t cross join
     (SELECT client_id, COUNT(*) as cnt
      FROM tbl
      GROUP BY client_id
     ) tt
where rn >= (80/100 * tt.cnt);
ORDER BY ordcolumn;

Upvotes: 2

Uueerdo
Uueerdo

Reputation: 15951

Using Gordon's answer as a starting point, I think this might be closer to what you need.

SELECT t.*
    , (@counter := @counter+1) AS overallRow
    , (@clientRow := if(@prevClient = t.client_id, @clientRow + 1,
                if(@prevClient := t.client_id, 1, 1) -- This just updates @prevClient without creating an extra field, though it makes it a little harder to read
               )
     ) AS clientRow
    -- Alteratively (for everything done in clientRow)
    , @clientRow := if(@prevClient = t.client_id, @clientRow + 1, 1) AS clientRow
    , @prevClient := t.client_id AS extraField
    -- This may be more reliable as well; I not sure if the order 
    -- of evaluation of IF(,,) is reliable enough to guarantee
    -- no side effects in the non-"alternatively" clientRow calculation.
FROM tbl AS t
INNER JOIN (
    SELECT client_id, COUNT(*) AS c
    FROM tbl 
    GROUP BY client_id
   ) AS cc ON tbl.client_id = cc.client_id
INNER JOIN (select @prevClient := -1, @clientRow := 0) AS initvar ON 1 = 1
WHERE t.client_id = 55
HAVING clientRow * 5 < cc.c  -- You can use a HAVING without a GROUP BY in MySQL 
     -- (note that clientRow is derived, so you cannot use it in the `WHERE`)
ORDER BY t.client_id, t.ordcolumn
;

Upvotes: 1

Related Questions