Reputation: 13
Can anyone help with this issue I'm having where the macro is only taking the final row value of the data?
I have some data that looks like this:
data data1 ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|4|5
11|2000|5|5|6
12|2001|1|2|3
13|2002|5|5|6
14|2001|2|2|2
15|2000|3|3|4
;;;;
However I need it to use the year variable to determine which data to keep, and then change all the values for the years after that value to missing, like so:
data data1 ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|.|.
11|2000|5|.|.
12|2001|1|2|.
13|2002|5|5|6
14|2001|2|2|.
15|2000|3|.|.
;;;;
I've been trying to get this macro to work, but it only works intermittently and works just for the final row of the data rather than looping through the rows.
%macro macro2 (output=, input=);
data &output;
set &input;
%DO I = 1 %TO 6;
%do; call symput('value2',trim(left(put(year_age_15,8.))));
temp_col=&value2.;
%let year_end=&value2.;
%put YEAR END IS: &year_end.;
%put EDU YEAR IS: EDU_&year_end.;
%do year = &year_end. %TO 2002;
%put &year.;
EDU_&year.=.;
%end;
%end;
%end;
run;
%MEND macro2;
%macro1(input=testset, output=output_testset);
In R it could be something simple like :
for(i in 1:6){.
do this
}
Any advice? I can't figure out which bit is going wrong, thanks!
Upvotes: 1
Views: 1456
Reputation: 27508
As @Joe mentions, the year to match is part of a variable name, which is tremor inducing 'data in the metadata'
You can use the VNAME
to retrieve the variable name of an index accessed array element. Use that feature to compare to expected variable name whilst looping over a variable array based on variables named EDU*
.
Example:
data have ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|4|5
11|2000|5|5|6
12|2001|1|2|3
13|2002|5|5|6
14|2001|2|2|2
15|2000|3|3|4
;;;;
data want;
set have;
array edus edu_:;
* find index of element corresponding to variable name having year;
do _n_ = 1 to dim(edus) until (upcase(vname(edus(_n_))) = cats('EDU_',year_age_15));
end;
* fill in elements at indices post the found one with missing values;
do _n_ = _n_+1 to dim(edus);
call missing(edus(_n_));
end;
run;
Upvotes: 0
Reputation: 21274
Create an array and index it by years rather than default 1:n
Loop through your array starting at year+1 and set to missing
data want;
set data1;
array educ(2000:2002) edu_2000-edu_2002;
if (year_age_15 +1) <= hbound(educ) then do i= (year_age_15 +1) to hbound(educ);
call missing(educ(i));
end;
run;
Upvotes: 1
Reputation: 63424
So, I think the issue here is your data is at the wrong level. You certainly can do what Reeza suggests, and I think it's probably reasonable to do so, but the reason why this is a bit complicated is that you have data in your variable name. That's not a best practice - your variable name should be "education" and your data should have a row for each year. Then this would be a simple WHERE statement!
Here's a simple PROC TRANSPOSE
that turns it to the right structure, and then if you really need it the other way, a second one will turn it back. The where
statement can be in the proc transpose
or could be used somewhere else.
proc transpose data=data1 out=data_t (where=(year_Age_15 ge input(scan(_NAME_,2,'_'),4.)));
by id year_Age_15;
var edu_:;
run;
proc transpose data=data_t out=want;
by id year_age_15;
id _name_;
var col1;
run;
Upvotes: 1