sa102
sa102

Reputation: 59

Grouping SQL by differences between rows?

I'm taking the min and max of several data points, but some of them have the patient going to the same locations on different days. Is there a way to add an index or code that distinguishes them as separate groups, based on an order by statement or something similar? I want to distinguish them as separate groups (by separating that the patient left one location, went to a new one, and then returned to the original location (treating it as a new location in terms of grouping)?

Here's the code to make a dummy data table.

    CREATE TABLE #StartandStop (
     [hospitalname] varchar(255),
     pt_id varchar(255),
     [Location] varchar(255),
     startdatetime datetime,
     enddatetime datetime
     ); 

     INSERT INTO #StartandStop ([hospitalname],[pt_id],[Location],[startdatetime],[enddatetime])
     VALUES ('ClevelandClinic', '1','A','01-01-2020 10:35', '01-02-2020 12:30'),
     ('ClevelandClinic', '1','B','01-02-2020 12:30', '01-03-2020 00:00'),
     ('ClevelandClinic', '1','B','01-03-2020 00:00', '01-04-2020 00:00'),
     ('ClevelandClinic', '1','B','01-04-2020 00:00', '01-04-2020 19:30'),
     ('ClevelandClinic', '1','C','01-04-2020 19:30', '01-05-2020 18:44'),
     ('ClevelandClinic', '1','B','01-05-2020 18:44', '01-05-2020 07:09'),
     ('ClevelandClinic', '1','D','01-05-2020 07:09', '01-05-2020 22:22'),
     ('ClevelandClinic', '1','D','01-05-2020 22:22', '01-06-2020 00:00'),
     ('ClevelandClinic', '1','D','01-06-2020 00:00', '01-07-2020 00:00'),
     ('ClevelandClinic', '1','D','01-07-2020 00:00', '01-08-2020 00:00'),
     ('ClevelandClinic', '1','C','01-08-2020 00:00', '01-08-2020 20:02');

This is the code to get results.

 select hospitalname,pt_id,location,min(startdatetime) as 'Start',max(enddatetime) as 'End'
  from  #StartandStop
 group by hospitalname,pt_id,location

These are the results I get when just taking the minimum start and maximum stop of each, without taking into account the patient's movements back and forth between units.

   hospitalname        pt_id    location    Start                       End
   ClevelandClinic     1         A          2020-01-01 10:35:00.000     2020-01-02 12:30:00.000
   ClevelandClinic     1         B          2020-01-02 12:30:00.000     2020-01-05 07:09:00.000
   ClevelandClinic     1         C          2020-01-04 19:30:00.000     2020-01-08 20:02:00.000
   ClevelandClinic     1         D          2020-01-05 07:09:00.000     2020-01-08 00:00:00.000

These are the results I'm trying to get:

 Hospital        Patient     Location     Start                          Stop
 ClevelandClinic    1        A            2020-01-01 10:35:00.000        2020-01-02 12:30:00.000
 ClevelandClinic    1        B            2020-01-02 12:30:00.000        2020-01-04 19:30:00.000
 ClevelandClinic    1        C            2020-01-04 19:30:00.000        2020-01-05 18:44:00.000
 ClevelandClinic    1        B            2020-01-05 18:44:00.000        2020-01-05 07:09:00.000
 ClevelandClinic    1        D            2020-01-05 07:09:00.000        2020-01-08 00:00:00.000
 ClevelandClinic    1        C            2020-01-08 00:00:00.000        2020-01-08 20:02:00.000

Please advise if there's a way to mark or distinguish first and second visits to the same location as different. Thank you!

Upvotes: 0

Views: 93

Answers (1)

disk eater
disk eater

Reputation: 121

ALTER TABLE #StartandStop ADD color INT

DECLARE @color INT
SELECT @color = 0

UPDATE ss SET color = @color, @color = CASE WHEN s.Location <> s.LastLocation THEN @color + 1 ELSE @color END
FROM #StartandStop ss JOIN (SELECT hospitalname, pt_id, Location, LAG(location) OVER (ORDER BY hospitalname,pt_id,startdatetime,location) AS LastLocation, startdatetime, enddatetime 
    FROM #StartandStop) s ON (ss.hospitalname = s.hospitalname AND ss.pt_id = s.pt_id AND ss.location = s.location AND ss.startdatetime = s.startdatetime AND ss.enddatetime = s.enddatetime)

SELECT hospitalname AS Hospital, pt_id AS Patient, MIN(Location) AS Location, MIN(startdatetime) AS Start, MAX(enddatetime) AS Stop
FROM #StartandStop
GROUP BY hospitalname,pt_id,color
ORDER BY 1,2,4,3

Upvotes: 1

Related Questions