Reputation: 23
I have a variable full of ZIP code observations and I want to sort those ZIP codes into four regions based on the first three digits of the code.
For example, all ZIP codes that start with 350, 351, or 352 should be grouped into a region called "central." Those that start with 362, 368, 360 or 361 should be in a region called "east." Etc.
How do I get base SAS to look at only the first three digits of the ZIP code variable? What is the best way to associate those digits with a new variable called "region?"
Here's the code I have so far:
data work.temp;
set library.dataset;
a= substr (Zip_Code,1,3);
put a;
keep Zip_Code a;
run;
proc print data=work.temp;
run;
The column a is blank in my proc print results, however.
Thanks for your help
Upvotes: 2
Views: 682
Reputation: 3315
As @joe explains, this is due to zipcode being defined as numeric variable. I have seen this happening in one of the client locaton, that zipcode is defined as numeric. It lead to various data issues . You should try to define zipcode as character variable and then you can assign regions by using if statements or by reference table or by proc format. Below are exaples of if statement and reference tables. I find reference table method very robust.
data have;
input zip_code $;
datalines;
35099
35167
35245
36278
36899
36167
;
By if statement
data work.temp;
set have;
if in('350', '351', '352') then Region ='EAST';
if substr (Zip_Code,1,3) in('362', '368', '361') then REgion ='WEST';
run;
By use of reference table
data reference;
input code $ Region $;
datalines;
350 EAST
351 EAST
352 EAST
362 WEST
368 WEST
361 WEST
;
proc sql;
select a.*, b.region from have a
left join
reference b
on substr (Zip_Code,1,3) = code;
Upvotes: 2
Reputation: 63424
If a
is blank, then your zip_code
variable is almost certainly numeric. You probably have a note about numeric to character conversion.
SAS will happily allow you to ignore numeric and character in most instances, but it won't always give correct behavior. In this case, it's probably converting it with the BEST12
format, meaning, 60601
becomes " 60601"
. So substr(that,1,3)
gives " "
, of course.
Zip code ideally would be stored in a character variable as it's an identifier, but if it's not for whatever reason, you can do this:
a = substr(put(zip_code,z5.),1,3);
The Zw.d
format is correct since you want Massachusetts to be "02101"
and not "2101 "
.
Upvotes: 1