Reputation: 29
I need to create some cycle in SAS proc-sql which divide data into groups. I have data
ID1 ID2 TIME GROUP
1234 12 22MAY2015:16:10:00.000 0
1234 12 22MAY2015:16:15:00.000 0
1234 12 12JUN2015:6:35:00.000 0
1234 12 12JUN2015:16:35:00.000 0
6549 45 15APR2015:16:10:00.000 0
6549 45 18APR2015:13:15:00.000 0
6549 45 18APR2015:13:18:00.000 0
6549 15 22MAY2015:14:15:00.000 0
6549 15 22MAY2015:14:20:00.000 0
and I need create new column GROUP where will be the same id for those rows which have same ID1, same ID2 and difference between TIME is max 10 minutes.
Result will be:
ID1 ID2 TIME GROUP
1234 12 22MAY2015:16:10:00.000 1
1234 12 22MAY2015:16:15:00.000 1
1234 12 12JUN2015:6:35:00.000 2
1234 12 12JUN2015:16:35:00.000 3
6549 45 15APR2015:16:10:00.000 4
6549 45 18APR2015:13:15:00.000 5
6549 45 18APR2015:13:18:00.000 5
6549 15 22MAY2015:14:15:00.000 6
6549 15 22MAY2015:14:20:00.000 6
I have tried to write some 'do while' cycle but it does not work.
data b;
set a;
time1 = time;
id1_1 = id1;
id2_1 = id2;
time2 = time;
id1_2 = id1;
id2_2 = id2;
group = group+1;
do while (id1_1 eq id1_2 id2_1 eq id2_2 floor((time2-time1)/60)<=10);
group = group;
time2 = time;
id1_2 = id1;
id2_2 = id2;
end;
run;
Thank you very much.
Upvotes: 1
Views: 569
Reputation: 906
Proc SQL is not a suitable tool for your problem. You have thought about 'Looping', which is a good start. However, the missing element in your code is the 'lagging' concept along with some other details. Your grouping condition has two parts: 1) Based on the natural group of ID1, ID2 2) On top of 1), additional group is generated if TIME is more than 10 minutes apart.
data have;
input (ID1 ID2) (:$8.) TIME:datetime23.;
format TIME:datetime23.;
cards;
1234 12 22MAY2015:16:10:00.000 1
1234 12 22MAY2015:16:15:00.000 1
1234 12 12JUN2015:6:35:00.000 2
1234 12 12JUN2015:16:35:00.000 3
6549 45 15APR2015:16:10:00.000 4
6549 45 18APR2015:13:15:00.000 5
6549 45 18APR2015:13:18:00.000 5
6549 15 22MAY2015:14:15:00.000 6
6549 15 22MAY2015:14:20:00.000 6
;
data want;
group+1; /*condition part 1*/
do until (last.id2);
set have;
by id1 id2 notsorted;
lag_time=lag(time);
if not first.id2 then group+intck('minute',lag_time,time)>10; /*condition part 2*/
output;
end;
drop lag_time;
run;
Upvotes: 2