Damien
Damien

Reputation: 392

Dummy variables in SAS

Suppose we have some data set people which has a categorical variable income with 4 levels (1,2,3,4). How would we code this in SAS? Would it be:

data people;
set people;
if income=1 then income1=1;
else if income=2 then income2=1
else if income  =3 then income3=1;
run;

In other words, this would create three dummy variable for the four levels. Is this right?

Upvotes: 4

Views: 9741

Answers (5)

Harshad Patil
Harshad Patil

Reputation: 313

Code:-

proc sql noprint;
 select distinct 'income' || strip(put(income,8.)) into :income_var    separated by ' '
 from people;
quit;

data people(rename = (in = income));
 set people(rename = (income = in));
 length &income_var. 8;
 array tmp_arr(*) income:;
 do i = 1 to dim(tmp_arr);
    if in eq i then tmp_arr(i) = 1;
    else tmp_arr(i) = 0;
 end;
 drop i;
run;

Working: Above SAS code is dynamic and will work for any number of levels of income variable, since it automatically creates number of variables according to number of distinct levels in the input people data set.

The data step will set respective variable to value 1 and others to 0 according the value of income variable.

Upvotes: 0

Amrita Sawant
Amrita Sawant

Reputation: 10913

You need not write "else" . Below will also work :

    income1_ind=(income1 eq 1);
    income2_ind=(income2 eq 2);

Upvotes: 1

vibowit
vibowit

Reputation: 11

And I would write something more general.

%macro cat(indata, variable);
  proc sql noprint;
    select distinct &variable. into :mvals separated by '|'
    from &indata.;

    %let mdim=&sqlobs;
  quit;

  data &indata.;
    set &indata.;
    %do _i=1 %to &mdim.;
      %let _v = %scan(&mvals., &_i., |);
      if &variable. = &_v. then &variable.&_v. = 1; else &variable.&_v = 0;
    %end;
  run;
%mend;

%cat(people, income);

Upvotes: 1

Joe
Joe

Reputation: 63424

A somewhat more flexible way to do it is with arrays.

data people;
set people;
array incomes income1-income4;
do _t = 1 to dim(incomes);
  if income=_t then income[_t] = 1;
  else if not missing(income) then income[_t]=0;
  else income[_t]=.;
end;
run;

Upvotes: 6

forecaster
forecaster

Reputation: 1159

I have modified your code below. This would give a 3 dummy coded variable. income = 4 would be your reference code.

data people_dummy;
         set people;
         if income=1 then income1=1 ; else income1=0;
         if income=2 then income2=1 ; else income2=0; 
         if income=3 then income3=1 ; else income3=0;
run;

Upvotes: 1

Related Questions