k.dkhk
k.dkhk

Reputation: 530

SAS: how to change the order of a categorical variable

To make it simple let's say I have a dataset consiting of four names: Anna, Bobby , Casper, Christine. The column name is just 'Names'.

I want to sort it in this order: Bobby, Anna, Casper, Christine. I cant use 'proc sort' with 'asc/desc' here. Because it's ordered randomly I need to type in the order manually.

Can I somehow include it the following proc sort statement?

 Proc Sort
  data = dataset; order by Names;
run;

Upvotes: 1

Views: 4113

Answers (2)

Longfish
Longfish

Reputation: 7602

The answer from @Joe is the best way due to being scalable and the ability to read in formats from a dataset using CNTLIN. I just thought I'd post an alternative solution, using proc sql. The version SAS uses enables you to create a custom order on the fly by adding an order by statement along with a case statement. Effectively this creates an extra column in memory which is used to sort on, but the column is not output.

This is a useful method when the number of items to sort on is relatively small.

proc sql;
create table want 
as select *
from have
order by case names
            when 'Bobby' then 1
            when 'Anna' then 2
            when 'Casper' then 3
            when 'Christine' then 4
        end;
quit;

Upvotes: 2

Joe
Joe

Reputation: 63434

The way you should store categorical variables in SAS is as numerics, with formats to show the characters. This is how other programming languages, e.g. R, handle them (factor in R).

So for example:

data have;
  length names $15;
  input names $;
  datalines;
Bobby
Anna
Casper
Christine
;;;;
run;

proc format;
  value yourformatf
   1 = 'Bobby'
   2 = 'Anna'
   3 = 'Casper'
   4 = 'Christine'
   other = ' '
   ;
  invalue yourinformati
   'Bobby' = 1
   'Anna' = 2
   'Casper' = 3
   'Christine' = 4
   other = .
   ;

quit;

data want;
  set have;
  names_cat = input(names,yourinformati.);
  format names_cat yourformatf.;
run;

Here I create a format and an informat to go back/forth (name to number, number to name). You can now sort by names_cat and it will sort as you want. You can do this programmatically (creating the format) using a cntlin dataset; search here or your search engine of choice for more information on that.

To compare to r, if that's what you're familiar with, here the numeric variable is analogous to the values in the factor variable, and the format is analogous to the labels for the levels. (There is no direct analogue to the levels themselves stored in the metadata, but many SAS procs have options to use the numbers stored in the format in the same way levels would be used in R.)

Upvotes: 4

Related Questions