lrk889
lrk889

Reputation: 119

SAS -Making a new variable if conditions match another variable?

I would like to make a new variable in my dataset. This variable is just a binary variable if someone has a tobacco disease or not. I am looking at patient data with each patient having up to 9 disease codes. I have a dataset called tobacco that stores all the tobacco disease codes.

This is what I thought I could do:

data outpreg;
set outpreg;
if diag1 = tobacco OR diag2 = tobacco OR diag3 = tobacco or diag4 = tobacco or diag5 = tobacco or diag6 = tobacco or 
diag7 = tobacco or diag8 = tobacco or diag9 = tobacco then co2=1;
run;

But this is giving me too many for it to be correct. Any help would be greatly appreciated.

Upvotes: 0

Views: 202

Answers (1)

DWal
DWal

Reputation: 2762

It's not doing what you want to do. Your current code is trying to compare the value of diag1 to a variable named tobacco in the same outpreg data set. Since there is no variable tobacco, SAS is creating a new variable tobacco and initializing it to missing .. In order to do what you want, I would join the outpreg data set to the tobacco dataset for each diag variable.

proc sql;
select
  o.*,
  t1.tobacco_cd is not null or
  t2.tobacco_cd is not null or
  t3.tobacco_cd is not null as co2
from
  outpreg as o
  left join tobacco as t1
  on o.diag1 = t1.tobacco_cd
  left join tobacco as t2
  on o.diag2 = t2.tobacco_cd
  left join tobacco as t3
  on o.diag3 = t3.tobacco_cd
;
quit;

This checks each diag variable against the list of codes, setting co2 to 1 if it matches, and 0 if it doesn't. For example, if diag1 matches, then t1.tobacco_cd is not null would be true, and the entire expression evaluates to 1.

You'd have to expand it to cover all nine of your variables instead of just three.

Another option is to put your tobacco codes into a format like Joe suggested in this question.

proc format;
  value $tobaccocd
   '30300','30301','30302','30303'= 'Tobacco'
    other='Not Tobacco';
quit;

Then you could create your co2 variable in a data step like this:

data outpreg2;
  set outpreg;
  if put(diag1,$tobaccocd.) = 'Tobacco' or
     put(diag2,$tobaccocd.) = 'Tobacco' or
     put(diag3,$tobaccocd.) = 'Tobacco' then co2 = 1;
  else co2 = 0;
run;

Upvotes: 1

Related Questions