KS Alabama
KS Alabama

Reputation: 23

Base SAS 9.2-- SUBSTR function to classify Zip Codes into regions

I have a variable full of ZIP code observations and I want to sort those ZIP codes into four regions based on the first three digits of the code.

For example, all ZIP codes that start with 350, 351, or 352 should be grouped into a region called "central." Those that start with 362, 368, 360 or 361 should be in a region called "east." Etc.

How do I get base SAS to look at only the first three digits of the ZIP code variable? What is the best way to associate those digits with a new variable called "region?"

Here's the code I have so far:

data work.temp;
    set library.dataset;
    a= substr (Zip_Code,1,3);
    put a;
    keep Zip_Code a;
run;

proc print data=work.temp;
run;

The column a is blank in my proc print results, however.

Thanks for your help

Upvotes: 2

Views: 682

Answers (2)

Kiran
Kiran

Reputation: 3315

As @joe explains, this is due to zipcode being defined as numeric variable. I have seen this happening in one of the client locaton, that zipcode is defined as numeric. It lead to various data issues . You should try to define zipcode as character variable and then you can assign regions by using if statements or by reference table or by proc format. Below are exaples of if statement and reference tables. I find reference table method very robust.

data have;
input zip_code $;
datalines;
35099
35167
35245
36278
36899
36167
;

By if statement

data work.temp;
set have;
if   in('350', '351', '352') then Region ='EAST';
if  substr (Zip_Code,1,3) in('362', '368', '361') then  REgion ='WEST';
run;

By use of reference table

data reference;
input code $ Region $;
datalines;
350 EAST
351 EAST
352 EAST
362 WEST
368 WEST
361 WEST
;


proc sql;
select a.*, b.region from have a
left join
 reference b
 on substr (Zip_Code,1,3) = code;

Upvotes: 2

Joe
Joe

Reputation: 63424

If a is blank, then your zip_code variable is almost certainly numeric. You probably have a note about numeric to character conversion.

SAS will happily allow you to ignore numeric and character in most instances, but it won't always give correct behavior. In this case, it's probably converting it with the BEST12 format, meaning, 60601 becomes " 60601". So substr(that,1,3) gives " ", of course.

Zip code ideally would be stored in a character variable as it's an identifier, but if it's not for whatever reason, you can do this:

a = substr(put(zip_code,z5.),1,3);

The Zw.d format is correct since you want Massachusetts to be "02101" and not "2101 ".

Upvotes: 1

Related Questions