Vendula
Vendula

Reputation: 29

divide data into groups in SAS proc-sql

I need to create some cycle in SAS proc-sql which divide data into groups. I have data

ID1     ID2 TIME                    GROUP
1234    12  22MAY2015:16:10:00.000  0
1234    12  22MAY2015:16:15:00.000  0
1234    12  12JUN2015:6:35:00.000   0
1234    12  12JUN2015:16:35:00.000  0
6549    45  15APR2015:16:10:00.000  0
6549    45  18APR2015:13:15:00.000  0
6549    45  18APR2015:13:18:00.000  0
6549    15  22MAY2015:14:15:00.000  0
6549    15  22MAY2015:14:20:00.000  0

and I need create new column GROUP where will be the same id for those rows which have same ID1, same ID2 and difference between TIME is max 10 minutes.

Result will be:

ID1     ID2 TIME                    GROUP
1234    12  22MAY2015:16:10:00.000  1
1234    12  22MAY2015:16:15:00.000  1
1234    12  12JUN2015:6:35:00.000   2
1234    12  12JUN2015:16:35:00.000  3
6549    45  15APR2015:16:10:00.000  4
6549    45  18APR2015:13:15:00.000  5
6549    45  18APR2015:13:18:00.000  5
6549    15  22MAY2015:14:15:00.000  6
6549    15  22MAY2015:14:20:00.000  6

I have tried to write some 'do while' cycle but it does not work.

data b;
set a;
time1 = time;
id1_1 = id1;
id2_1 = id2;
time2 = time;
id1_2 = id1;
id2_2 = id2;
group = group+1;
do while (id1_1 eq id1_2 id2_1 eq id2_2 floor((time2-time1)/60)<=10);
group = group;
time2 = time;
id1_2 = id1;
id2_2 = id2;
end;
run;

Thank you very much.

Upvotes: 1

Views: 569

Answers (1)

Haikuo Bian
Haikuo Bian

Reputation: 906

Proc SQL is not a suitable tool for your problem. You have thought about 'Looping', which is a good start. However, the missing element in your code is the 'lagging' concept along with some other details. Your grouping condition has two parts: 1) Based on the natural group of ID1, ID2 2) On top of 1), additional group is generated if TIME is more than 10 minutes apart.

data have;
    input (ID1     ID2) (:$8.) TIME:datetime23.;
    format TIME:datetime23.;
    cards;
1234    12  22MAY2015:16:10:00.000  1
1234    12  22MAY2015:16:15:00.000  1
1234    12  12JUN2015:6:35:00.000   2
1234    12  12JUN2015:16:35:00.000  3
6549    45  15APR2015:16:10:00.000  4
6549    45  18APR2015:13:15:00.000  5
6549    45  18APR2015:13:18:00.000  5
6549    15  22MAY2015:14:15:00.000  6
6549    15  22MAY2015:14:20:00.000  6
;

data want;
    group+1; /*condition part 1*/

    do until (last.id2);
        set have;
        by id1 id2 notsorted;
        lag_time=lag(time);

        if not first.id2 then group+intck('minute',lag_time,time)>10; /*condition part 2*/
            output;
    end;

    drop lag_time;
run;

Upvotes: 2

Related Questions