Reputation: 169
I have a data set in SAS:
OBS CAR DATE_TIME
1 HON JAN-01-17 13:00
2 HON JAN-01-17 13:04
3 HON JAN-01-17 13:06
4 HON JAN-01-17 13:15
5 HON JAN-01-17 13:20
6 HON JAN-01-17 13:29
7 TOY JAN-01-17 13:05
8 TOY JAN-01-17 13:10
9 TOY JAN-01-17 13:39
The data represents event time stamp for a car type. I am trying to calculate the total number of events for any 10 minute interval for a specific car. Currently, I am doing it by adding another row which is 10 minutes plus the date time column and then joining the table with itself. Here is the code.
PROC SQL; CREATE TABLE WANT AS
SELECT A.OBS,A.CAR,A.DATE_TIME,A.DATE_TIME+(10*60) AS ENDTM
COUNT(B.OBS) AS TOTAL
FROM HAVE A LEFT JOIN HAVE B ON A.CAR=B.CAR AND B.DATE_TIME BETWEEN A.DATE_TIME AND B.ENDTM
GROUP BY A.OBS,A.CAR;QUIT;
This is the output I get:
OBS CAR DATE_TIME TOT
1 HON JAN-01-17 13:00 3
2 HON JAN-01-17 13:04 2
3 HON JAN-01-17 13:06 2
4 HON JAN-01-17 13:15 2
5 HON JAN-01-17 13:20 2
6 HON JAN-01-17 13:29 1
7 TOY JAN-01-17 13:05 2
8 TOY JAN-01-17 13:10 1
9 TOY JAN-01-17 13:39 1
Is there a more efficient way to do it using Data step ?
Thanks
Jay
Upvotes: 0
Views: 200
Reputation: 63424
One data step option is to use a temporary array and store the data in that as you see it, and then check what elements of the array still meet your needs. I do it here in the opposite direction as you show above (I am doing '10 minutes before') but you can just reverse-sort the data and do it in the direction you require (but change the intck
comparison around).
data have;
input @1 OBS 1. @6 CAR $3. @12 DATE_TIME anydtdtm15.;
format date_time datetime17.;
datalines;
1 HON JAN-01-17 13:00
2 HON JAN-01-17 13:04
3 HON JAN-01-17 13:06
4 HON JAN-01-17 13:15
5 HON JAN-01-17 13:20
6 HON JAN-01-17 13:29
7 TOY JAN-01-17 13:05
8 TOY JAN-01-17 13:10
9 TOY JAN-01-17 13:39
;;;;
run;
data want;
set have;
by car date_time;
array prev_times[20] _temporary_;
tot = 1;
do _i = dim(prev_times) to 1 by -1 while (not missing(prev_times[_i]));
if intck('minute',prev_times[_i], date_time) le 10 then do;
tot = tot + 1;
end;
else do;
call missing(prev_times[_i]);
end;
end;
prev_times[_i] = date_time;
call sortn(of prev_times[*]);
output;
if last.car then call missing(of prev_times[*]);
run;
Upvotes: 1
Reputation: 12849
Not a data step, but proc timeseries
will do it for you. Just convert your date into a datetime and use an interval of minute10.
.
data have;
input group$ date$ time$ tot;
month = scan(date, 1, '-');
day = scan(date, 2, '-');
year = scan(date, 3, '-');
datetime = input(cats(day, month, year, ':', time), datetime.);
format datetime datetime.;
datalines;
HON JAN-01-17 13:00 3
HON JAN-01-17 13:04 2
HON JAN-01-17 13:06 2
HON JAN-01-17 13:15 2
HON JAN-01-17 13:20 2
HON JAN-01-17 13:29 1
TOY JAN-01-17 13:05 2
TOY JAN-01-17 13:10 1
TOY JAN-01-17 13:39 1
;
run;
proc timeseries data=have out=want;
by group;
id datetime interval=minute10.;
var tot / accumulate=total;
run;
Upvotes: 0