Reputation: 530
To make it simple let's say I have a dataset consiting of four names: Anna, Bobby , Casper, Christine. The column name is just 'Names'.
I want to sort it in this order: Bobby, Anna, Casper, Christine. I cant use 'proc sort' with 'asc/desc' here. Because it's ordered randomly I need to type in the order manually.
Can I somehow include it the following proc sort statement?
Proc Sort
data = dataset; order by Names;
run;
Upvotes: 1
Views: 4113
Reputation: 7602
The answer from @Joe is the best way due to being scalable and the ability to read in formats from a dataset using CNTLIN. I just thought I'd post an alternative solution, using proc sql
. The version SAS uses enables you to create a custom order on the fly by adding an order by
statement along with a case
statement. Effectively this creates an extra column in memory which is used to sort on, but the column is not output.
This is a useful method when the number of items to sort on is relatively small.
proc sql;
create table want
as select *
from have
order by case names
when 'Bobby' then 1
when 'Anna' then 2
when 'Casper' then 3
when 'Christine' then 4
end;
quit;
Upvotes: 2
Reputation: 63434
The way you should store categorical variables in SAS is as numerics, with formats to show the characters. This is how other programming languages, e.g. R, handle them (factor
in R).
So for example:
data have;
length names $15;
input names $;
datalines;
Bobby
Anna
Casper
Christine
;;;;
run;
proc format;
value yourformatf
1 = 'Bobby'
2 = 'Anna'
3 = 'Casper'
4 = 'Christine'
other = ' '
;
invalue yourinformati
'Bobby' = 1
'Anna' = 2
'Casper' = 3
'Christine' = 4
other = .
;
quit;
data want;
set have;
names_cat = input(names,yourinformati.);
format names_cat yourformatf.;
run;
Here I create a format and an informat to go back/forth (name to number, number to name). You can now sort by names_cat
and it will sort as you want. You can do this programmatically (creating the format) using a cntlin
dataset; search here or your search engine of choice for more information on that.
To compare to r
, if that's what you're familiar with, here the numeric variable is analogous to the values in the factor variable, and the format is analogous to the labels for the levels. (There is no direct analogue to the levels themselves stored in the metadata, but many SAS procs have options to use the numbers stored in the format in the same way levels would be used in R.)
Upvotes: 4