Kul
Kul

Reputation: 73

Categorization of non-numeric data using SAS

I wonder if there is a way to categorize data when it is not numeric. Is there a way to specify all the conditions for the if-then statements in single line ?

Here is part of my code.

data new;
set old;
if target EQ 'purchase|laboratory|dept' then category = 'internal';
if target EQ 'purchase|office|member' then category ='internal';
if target EQ 'purchase|floor|ext' then category='external';
run;

Upvotes: 0

Views: 121

Answers (1)

Richard
Richard

Reputation: 27498

Kul:

You can use if / then / else logic to perform all the assignments in a single statement

if target EQ 'purchase|laboratory|dept' then category = 'internal'; else
if target EQ 'purchase|office|member' then category ='internal'; else
if target EQ 'purchase|floor|ext' then category='external';

Long runs of your if then else can be equivalently stated in a select statement

select (target);
  when ('purchase|laboratory|dept') category = 'internal';
  when ('purchase|office|member') category = 'internal';
  when ('purchase|floor|ext') category = 'external';
  otherwise category = 'other';
end;

The parenthesis are required.

A custom format can also be used for the case of many values mapping to a few category values, and for the case of processing target alone as a formatted categorical (for example class target; format target $target_cat.;). The benefit is that the mapping is stored as data instead of as SAS source code.

* mapping data;

data target_categories;
  length target $60 category $20;
  input target & category; datalines;
purchase|laboratory|dept  internal
purchase|office|member    internal
purchase|floor|ext        external
run;

* conform mapping data to proc format cntlin= requirements;

data format_data;
  set target_categories;
  start = target;
  label = category;
  fmtname = '$target_cat';
run;

* construct custom format;

proc format cntlin=format_data;
run;

Sample data

data old;
  do x = 1 to 20;
    target = 'purchase|laboratory|dept'; output;
    target = 'purchase|office|member'; output;
    target = 'purchase|floor|ext'; output;
  end;
run;

Apply format using put

data new;
  set old;
  category = put (target,$target_cat.);
run;

You can also process the data with out creating a second variable. For example

proc tabulate data=old;
  class target;
  format target $target_cat.;  * reported values will show internal/external;
  table target, n / nocellmerge;
run;

Upvotes: 3

Related Questions