Reputation: 49
I am trying to sum variables in an array based on start and end date. For each ID there is one row (if the start and end date are within the same year), two rows (if the start and end date are within consecutive years) or multiple rows for different periods of start and end dates. There are 12 variables with counts for each month i.e. v1-v12 where v1 is january and v12 is december. The two rows for some ID contain monthly values for the 2 consecutive years i.e. within the stat year and the end year. I am trying to get the sum count for the array variables but only from the start date to the end date for each ID. For example, for ID 1 the start date is 07/23/2007 and end date is 06/07/2008, i would like to sum from V7 (july start month) to v12 in 2007 and V1 to V6 (june end month) in 2008 i.e. second row. Here's what I have:
ID STARTDATE ENDDATE YR V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 07/23/2007 06/07/2008 2007 3 5 2 6 3 2 1 3 4 1 2 3
1 07/23/2007 06/07/2008 2008 0 4 2 2 3 0 1 3 1 0 2 3
2 02/01/2002 07/27/2002 2002 1 0 2 3 1 0 1 2 3 0 0 2
3 05/26/2008 03/07/2009 2008 2 0 2 3 1 2 1 1 3 0 0 1
3 05/26/2008 03/07/2009 2009 4 1 4 3 1 0 2 3 3 1 0 3
3 10/17/2011 08/17/2012 2011 3 3 0 1 0 1 1 5 3 1 0 1
3 10/17/2011 08/17/2012 2012 1 3 2 3 1 0 1 2 3 2 0 2
4 02/27/2004 01/22/2005 2004 2 0 2 3 1 2 1 1 3 0 0 1
4 02/27/2004 01/22/2005 2005 0 4 2 2 3 0 1 3 1 0 2 3
and this is what I want :
ID STARTDATE ENDDATE YR V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 sum
1 07/23/2007 06/07/2008 2007 3 5 2 6 3 2 [1 3 4 1 2 3] 25
1 07/23/2007 06/07/2008 2008 [0 4 2 2 3 0] 1 3 1 0 2 3 25
2 02/01/2002 07/27/2002 2002 1 [0 2 3 1 0 1] 2 3 0 0 2 8
3 05/26/2008 03/07/2009 2008 2 0 2 3 [1 2 1 1 3 0 0 1] 18
3 05/26/2008 03/07/2009 2009 [4 1 4] 3 1 0 2 3 3 1 0 3 18
3 10/17/2011 08/17/2012 2011 3 3 0 1 0 1 1 5 3 [1 0 1] 15
3 10/17/2011 08/17/2012 2011 [1 3 2 3 1 0 1 2] 3 2 0 2 15
4 02/27/2004 01/22/2005 2004 2 [0 2 3 1 2 1 1 3 0 0 1] 14
4 02/27/2004 01/22/2005 2005 [0] 4 2 2 3 0 1 3 1 0 2 3 14
Here's the code I tried
data want;
set have;
array vars(*) V1-V12;
DT_CHECK=intnx('month',ENDDATE,-12);
start=intck('month','STARTDATE,DT_CHECK)+1;
if start<1 then do;
error 'Start date out of range';
delete;
end;
else if start>dim(vars)-12 then do;
error 'End date out of range';
delete;
end;
do _N_=start to start+12;
sum_n+vars(_N_);
end;
format DT_CHECK mmddyy10.;
run;
But am having problems. Any help is appreciated. Thank you.
Upvotes: 0
Views: 550
Reputation: 51566
Since each observation represents one year a straight forward approach would be to just loop month from Jan to Dec and check if that month falls within your date range.
data want;
do until(last.startdate);
set have;
by id startdate;
array v v1-v12;
do month=1 to 12 ;
if intnx('month',startdate,0,'b')<=mdy(month,1,yr)<=intnx('month',enddate,0,'e')
then sum=sum(sum,v[month])
;
end;
end;
keep id startdate enddate sum;
run;
Results:
Obs ID STARTDATE ENDDATE sum
1 1 2007-07-23 2008-06-07 25
2 2 2002-02-01 2002-07-27 7
3 3 2008-05-26 2009-03-07 18
4 3 2011-10-17 2012-08-17 15
5 4 2004-02-27 2005-01-22 14
Upvotes: 1
Reputation: 27498
A DOW / serial loop technique can compute a value for criteria over a group, and then apply that value to each row in group.
Example:
Requires the start to end date intervals within an id
be mutually exclusive (i.e. do not overlap and data are sorted by id startdate enddate
)
data want;
* [sum] variable is implicitly reset to missing at the top of the step.;
do _n_ = 1 by 1 until (last.enddate);
set have;
by id startdate enddate;
array v(12);
_month1 = intnx('month', startdate, 0);
_month2 = intnx('month', enddate, 0);
do _index = 1 to 12;
if _month1 <= mdy(_index,1,yr) <= _month2 then sum = sum(sum,v(_index));
end;
end;
do _n_ = 1 to _n_;
set have;
output;
end;
format sum 4.;
drop _:;
run;
The answer does not address the scenario of startdate
to enddate
intervals that overlap within an id
.
Upvotes: 1