tassles
tassles

Reputation: 13

SAS: How to loop a macro over rows of data to change to missing

Can anyone help with this issue I'm having where the macro is only taking the final row value of the data?

I have some data that looks like this:

data data1 ;  
  infile datalines dsd dlm='|' truncover;  
  input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;  
datalines4;  
10|2000|3|4|5  
11|2000|5|5|6  
12|2001|1|2|3  
13|2002|5|5|6  
14|2001|2|2|2  
15|2000|3|3|4   
;;;;  

However I need it to use the year variable to determine which data to keep, and then change all the values for the years after that value to missing, like so:

data data1 ;  
  infile datalines dsd dlm='|' truncover;  
  input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;  
datalines4;  
10|2000|3|.|.   
11|2000|5|.|.   
12|2001|1|2|.  
13|2002|5|5|6  
14|2001|2|2|.  
15|2000|3|.|.  
;;;;  

I've been trying to get this macro to work, but it only works intermittently and works just for the final row of the data rather than looping through the rows.

%macro macro2 (output=, input=);  
data &output;  
set &input;  
%DO I = 1 %TO 6;  
%do; call symput('value2',trim(left(put(year_age_15,8.))));   
temp_col=&value2.;  
%let year_end=&value2.;  
%put YEAR END IS: &year_end.;  
%put EDU YEAR IS:  EDU_&year_end.;  
%do year = &year_end. %TO 2002;  
%put &year.;  
EDU_&year.=.;  
%end;  
%end;  
%end;  
run;  
%MEND macro2;  
%macro1(input=testset, output=output_testset);  

In R it could be something simple like :
for(i in 1:6){. do this }

Any advice? I can't figure out which bit is going wrong, thanks!

Upvotes: 1

Views: 1456

Answers (3)

Richard
Richard

Reputation: 27508

As @Joe mentions, the year to match is part of a variable name, which is tremor inducing 'data in the metadata'

You can use the VNAME to retrieve the variable name of an index accessed array element. Use that feature to compare to expected variable name whilst looping over a variable array based on variables named EDU*.

Example:

data have ;  
  infile datalines dsd dlm='|' truncover;  
  input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;  
datalines4;  
10|2000|3|4|5  
11|2000|5|5|6  
12|2001|1|2|3  
13|2002|5|5|6  
14|2001|2|2|2  
15|2000|3|3|4   
;;;;

data want;
  set have;
  array edus edu_:;

  * find index of element corresponding to variable name having year;
  do _n_ = 1 to dim(edus) until (upcase(vname(edus(_n_))) = cats('EDU_',year_age_15));
  end;

  * fill in elements at indices post the found one with missing values;
  do _n_ = _n_+1 to dim(edus);
    call missing(edus(_n_));
  end;  
run;

enter image description here

Upvotes: 0

Reeza
Reeza

Reputation: 21274

  1. Create an array and index it by years rather than default 1:n

  2. Loop through your array starting at year+1 and set to missing

    data want;
     set data1;
     array educ(2000:2002) edu_2000-edu_2002;
     if (year_age_15 +1) <= hbound(educ) then do i= (year_age_15 +1) to hbound(educ);
        call missing(educ(i));
     end;
     run;
    

Upvotes: 1

Joe
Joe

Reputation: 63424

So, I think the issue here is your data is at the wrong level. You certainly can do what Reeza suggests, and I think it's probably reasonable to do so, but the reason why this is a bit complicated is that you have data in your variable name. That's not a best practice - your variable name should be "education" and your data should have a row for each year. Then this would be a simple WHERE statement!

Here's a simple PROC TRANSPOSE that turns it to the right structure, and then if you really need it the other way, a second one will turn it back. The where statement can be in the proc transpose or could be used somewhere else.

proc transpose data=data1 out=data_t (where=(year_Age_15 ge input(scan(_NAME_,2,'_'),4.)));
  by id year_Age_15;
  var edu_:;
run;

proc transpose data=data_t out=want;
  by id year_age_15;
  id _name_;
  var col1;
run;

Upvotes: 1

Related Questions