learnlearn10
learnlearn10

Reputation: 169

Rolling Numbers by interval SAS EG

I have a data set in SAS:

OBS  CAR       DATE_TIME
1    HON   JAN-01-17  13:00
2    HON   JAN-01-17  13:04
3    HON   JAN-01-17  13:06 
4    HON   JAN-01-17  13:15
5    HON   JAN-01-17  13:20
6    HON   JAN-01-17  13:29
7    TOY   JAN-01-17  13:05
8    TOY   JAN-01-17  13:10
9    TOY   JAN-01-17  13:39

The data represents event time stamp for a car type. I am trying to calculate the total number of events for any 10 minute interval for a specific car. Currently, I am doing it by adding another row which is 10 minutes plus the date time column and then joining the table with itself. Here is the code.

PROC SQL; CREATE TABLE WANT AS 
SELECT A.OBS,A.CAR,A.DATE_TIME,A.DATE_TIME+(10*60) AS ENDTM
COUNT(B.OBS) AS TOTAL
FROM HAVE A LEFT JOIN HAVE B ON A.CAR=B.CAR AND B.DATE_TIME BETWEEN A.DATE_TIME AND B.ENDTM
GROUP BY A.OBS,A.CAR;QUIT;

This is the output I get:

OBS  CAR       DATE_TIME       TOT
1    HON   JAN-01-17  13:00     3
2    HON   JAN-01-17  13:04     2
3    HON   JAN-01-17  13:06     2
4    HON   JAN-01-17  13:15     2
5    HON   JAN-01-17  13:20     2
6    HON   JAN-01-17  13:29     1
7    TOY   JAN-01-17  13:05     2
8    TOY   JAN-01-17  13:10     1
9    TOY   JAN-01-17  13:39     1

Is there a more efficient way to do it using Data step ?

Thanks
Jay

Upvotes: 0

Views: 200

Answers (2)

Joe
Joe

Reputation: 63424

One data step option is to use a temporary array and store the data in that as you see it, and then check what elements of the array still meet your needs. I do it here in the opposite direction as you show above (I am doing '10 minutes before') but you can just reverse-sort the data and do it in the direction you require (but change the intck comparison around).

data have;
input @1 OBS 1. @6 CAR $3. @12 DATE_TIME anydtdtm15.;
format date_time datetime17.;
datalines;
1    HON   JAN-01-17 13:00
2    HON   JAN-01-17 13:04
3    HON   JAN-01-17 13:06
4    HON   JAN-01-17 13:15
5    HON   JAN-01-17 13:20
6    HON   JAN-01-17 13:29
7    TOY   JAN-01-17 13:05
8    TOY   JAN-01-17 13:10
9    TOY   JAN-01-17 13:39
;;;;
run;

data want;
  set have;
  by car date_time;
  array prev_times[20] _temporary_;
  tot = 1;
  do _i = dim(prev_times) to 1 by -1 while (not missing(prev_times[_i]));
   if intck('minute',prev_times[_i], date_time) le 10 then do;
      tot = tot + 1;
    end;
    else do;
      call missing(prev_times[_i]);
    end;
  end;
  prev_times[_i] = date_time; 
  call sortn(of prev_times[*]);
  output;
  if last.car then call missing(of prev_times[*]);

run;

Upvotes: 1

Stu Sztukowski
Stu Sztukowski

Reputation: 12849

Not a data step, but proc timeseries will do it for you. Just convert your date into a datetime and use an interval of minute10..

data have;
    input group$ date$ time$ tot;

    month = scan(date, 1, '-');
    day = scan(date, 2, '-');
    year = scan(date, 3, '-');

    datetime = input(cats(day, month, year, ':', time), datetime.);

    format datetime datetime.;

    datalines;
HON JAN-01-17 13:00 3
HON JAN-01-17 13:04 2
HON JAN-01-17 13:06 2
HON JAN-01-17 13:15 2
HON JAN-01-17 13:20 2
HON JAN-01-17 13:29 1
TOY JAN-01-17 13:05 2
TOY JAN-01-17 13:10 1
TOY JAN-01-17 13:39 1
;
run;

proc timeseries data=have out=want;
    by group;
    id datetime interval=minute10.;
    var tot / accumulate=total;
run;

Upvotes: 0

Related Questions