stunt
stunt

Reputation: 49

Summing across SAS arrays based on intervals i.e. start date and end date

I am trying to sum variables in an array based on start and end date. For each ID there is one row (if the start and end date are within the same year), two rows (if the start and end date are within consecutive years) or multiple rows for different periods of start and end dates. There are 12 variables with counts for each month i.e. v1-v12 where v1 is january and v12 is december. The two rows for some ID contain monthly values for the 2 consecutive years i.e. within the stat year and the end year. I am trying to get the sum count for the array variables but only from the start date to the end date for each ID. For example, for ID 1 the start date is 07/23/2007 and end date is 06/07/2008, i would like to sum from V7 (july start month) to v12 in 2007 and V1 to V6 (june end month) in 2008 i.e. second row. Here's what I have:

ID  STARTDATE   ENDDATE      YR    V1  V2  V3  V4  V5  V6  V7  V8  V9  V10 V11 V12
1   07/23/2007  06/07/2008  2007    3   5   2   6   3   2   1   3   4   1   2   3
1   07/23/2007  06/07/2008  2008    0   4   2   2   3   0   1   3   1   0   2   3
2   02/01/2002  07/27/2002  2002    1   0   2   3   1   0   1   2   3   0   0   2
3   05/26/2008  03/07/2009  2008    2   0   2   3   1   2   1   1   3   0   0   1
3   05/26/2008  03/07/2009  2009    4   1   4   3   1   0   2   3   3   1   0   3
3   10/17/2011  08/17/2012  2011    3   3   0   1   0   1   1   5   3   1   0   1
3   10/17/2011  08/17/2012  2012    1   3   2   3   1   0   1   2   3   2   0   2
4   02/27/2004  01/22/2005  2004    2   0   2   3   1   2   1   1   3   0   0   1
4   02/27/2004  01/22/2005  2005    0   4   2   2   3   0   1   3   1   0   2   3

and this is what I want :

ID  STARTDATE   ENDDATE      YR    V1  V2  V3  V4  V5  V6  V7  V8  V9  V10  V11  V12  sum
1   07/23/2007  06/07/2008  2007    3   5   2   6   3   2  [1   3   4    1    2    3]  25
1   07/23/2007  06/07/2008  2008   [0   4   2   2   3   0]  1   3   1    0    2    3   25
2   02/01/2002  07/27/2002  2002    1  [0   2   3   1   0   1]  2   3    0    0    2    8
3   05/26/2008  03/07/2009  2008    2   0   2   3  [1   2   1   1   3    0    0    1]  18
3   05/26/2008  03/07/2009  2009   [4   1   4]  3   1   0   2   3   3    1    0    3   18
3   10/17/2011  08/17/2012  2011    3   3   0   1   0   1   1   5   3   [1    0    1]  15
3   10/17/2011  08/17/2012  2011   [1   3   2   3   1   0   1   2]  3    2    0    2   15
4   02/27/2004  01/22/2005  2004    2  [0   2   3   1   2   1   1   3    0    0    1]  14
4   02/27/2004  01/22/2005  2005   [0]  4   2   2   3   0   1   3   1    0    2    3   14

Here's the code I tried

data want;
  set have;
  array vars(*) V1-V12;
  DT_CHECK=intnx('month',ENDDATE,-12);
  start=intck('month','STARTDATE,DT_CHECK)+1;
  if start<1 then do;
    error 'Start date out of range';
    delete;
    end;
  else if start>dim(vars)-12 then do;
    error 'End date out of range';
    delete;
    end;
  do _N_=start to start+12;
    sum_n+vars(_N_);
    end;
  format DT_CHECK mmddyy10.;
run;

But am having problems. Any help is appreciated. Thank you.

Upvotes: 0

Views: 550

Answers (2)

Tom
Tom

Reputation: 51566

Since each observation represents one year a straight forward approach would be to just loop month from Jan to Dec and check if that month falls within your date range.

data want;
 do until(last.startdate);
   set have;
   by id startdate;
   array v v1-v12;
   do month=1 to 12 ;
     if intnx('month',startdate,0,'b')<=mdy(month,1,yr)<=intnx('month',enddate,0,'e')
       then sum=sum(sum,v[month])
     ;
   end;
 end;
 keep id startdate enddate sum;
run;

Results:

Obs    ID     STARTDATE       ENDDATE    sum

 1     1     2007-07-23    2008-06-07     25
 2     2     2002-02-01    2002-07-27      7
 3     3     2008-05-26    2009-03-07     18
 4     3     2011-10-17    2012-08-17     15
 5     4     2004-02-27    2005-01-22     14

Upvotes: 1

Richard
Richard

Reputation: 27498

A DOW / serial loop technique can compute a value for criteria over a group, and then apply that value to each row in group.

Example:

Requires the start to end date intervals within an id be mutually exclusive (i.e. do not overlap and data are sorted by id startdate enddate)

data want;
  * [sum] variable is implicitly reset to missing at the top of the step.;

  do _n_ = 1 by 1 until (last.enddate);
    set have;
    by id startdate enddate;

    array v(12);

    _month1 = intnx('month', startdate, 0);
    _month2 = intnx('month', enddate,   0);

    do _index = 1 to 12;
      if _month1 <= mdy(_index,1,yr) <= _month2 then sum = sum(sum,v(_index));
    end;
  end;

  do _n_ = 1 to _n_;
    set have;
    output;
  end;

  format sum 4.;
  drop _:;
run;

The answer does not address the scenario of startdate to enddate intervals that overlap within an id.

Upvotes: 1

Related Questions