Arif
Arif

Reputation: 1211

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.

tbl_patient

id | patient_ID | visit_ID | patient_result
1  |     1      |   1      |     5
2  |     2      |   1      |     6
3  |     2      |   3      |     7
4  |     1      |   2      |     3
5  |     2      |   3      |     2
6  |     1      |   3      |     9

I tried the query below to fetch the last visit of each patient as,

SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`

Now i want to fetch the 2nd last visit of each patient with query but it give me error (#1242 - Subquery returns more than 1 row)

SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`

Where I'm wrong

Upvotes: 0

Views: 224

Answers (6)

spencer7593
spencer7593

Reputation: 108400

There are a couple of approaches to getting the specified resultset returned in a single SQL statement.

Unfortunately, most of those approaches yield rather unwieldy statements.

The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.

Three of the most common approaches make use of:

  • correlated subquery
  • inequality join (nearly a Cartesian product)
  • two passes over the data

Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:


SELECT t.id
     , t.patient_id
     , t.visit_id
     , t.patient_result
  FROM (
         SELECT p.id
              , p.patient_id
              , p.visit_id
              , p.patient_result
              , @rn := if(@prev_patient_id = patient_id, @rn + 1, 1) AS rn
              , @prev_patient_id := patient_id AS prev_patient_id
           FROM tbl_patients p
           JOIN (SELECT @rn := 0, @prev_patient_id := NULL) i
          ORDER BY p.patient_id DESC, p.id DESC
       ) t
WHERE t.rn <= 2

Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.

This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.

This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.

Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.


It's also possible to get an equivalent resultset with a correlated subquery:

SELECT t.id
     , t.patient_id
     , t.visit_id
     , t.patient_result
  FROM tbl_patients t
 WHERE ( SELECT COUNT(1) AS cnt
           FROM tbl_patients p
          WHERE p.patient_id = t.patient_id
            AND p.id >= t.id
       ) <= 2
 ORDER BY t.patient_id ASC, t.id ASC

Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.


As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:

SELECT t.id
     , t.patient_id
     , t.visit_id
     , t.patient_result
  FROM tbl_patients t
  LEFT
  JOIN tbl_patients p
    ON p.patient_id = t.patient_id 
   AND t.id < p.id
 GROUP
    BY t.id
     , t.patient_id
     , t.visit_id
     , t.patient_result
HAVING COUNT(1) <= 2

Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.

Upvotes: 1

remigio
remigio

Reputation: 4211

Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9. You can try this query:

SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
    select max(id) 
    from tbl_patient
    GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
    select max(id) 
    from tbl_patient
    where id not in (
        select max(id) 
        from tbl_patient
        GROUP BY patient_ID)
    GROUP BY patient_ID)
order by 1,2

Upvotes: 2

Barmar
Barmar

Reputation: 780984

select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
      from tbl_patient
      group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id

id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.

Upvotes: 4

Adeel Ahmed
Adeel Ahmed

Reputation: 1601

Try this..

SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID)  - 1) GROUP BY patient_ID

Upvotes: 0

jcjr
jcjr

Reputation: 1503

SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2 
      ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`

Upvotes: 1

Pete St.
Pete St.

Reputation: 19

Why not use simply...

GROUP BY `patient_ID` DESC LIMIT 2

... and do the rest in the next step?

Upvotes: -1

Related Questions