Reputation: 33
I'm trying to figure out a small coding challenge here.
I have a variable, RESULT, that is a character variable, but needs to be converted to numeric. Most of the results are regular numbers, i.e. "90", "90.0", "55.42", etc. However, there are a lot of weird results, such as "UNDETECTABLE" or "1.29E7" or such results.
What I want to do is extract all the observations that have a character OTHER than the numeric digits OR the value "." (i.e. a period). Then I can manually assign those values.
I have a very large dataset, but limited computing power, so I can't scroll through and pick out the odd observations with special characters. It just freezes up my computer and takes way too long.
Thoughts on how to best accomplish this? Is there a SAS function that works for such a task? I've thought about the compress function, but I need to make sure I'm not missing any observations with special characters (i.e. characters other than numbers and period).
Thank you!
Upvotes: 2
Views: 494
Reputation: 51621
Couldn't you just get the distinct set of non-number values? That should be smaller than taking every observation that has a non-number value.
One way to test for a valid number is let SAS do it for you. The INPUT()
function can convert text strings to numbers. If you use the COMMA
informat then in addition to properly converting scientific notation values like 1.29E7
, which is just 12900000, it can also handle values with commas or dollar signs.
proc sql ;
create table want as
select distinct result
from have
where result not in (' ','.')
and input(result,comma32.)=.
;
quit;
That should find values like "UNDETECTABLE", but treat values like "90", "90.0", "55.42", "1.29E7", or "12,345" as valid numbers.
Upvotes: 0
Reputation: 152
Try this:
data out; /*output dataset*/
set in; /*input dataset*/
result = trim(result); /*trailing blanks - not have to be a problem*/
clear_number = compress(result, '.'); /*remove period from result*/
/*then, clear_number have to have only digits, so:*/
if notdigit(clear_number) then delete;
/*but, maybe, result have more then one period?*/
if count(result, '.') > 1 then delete;
result_numeric = result*1; /*lazy convertion*/
run;
Upvotes: 0
Reputation: 63434
COMPRESS
will handle this for you nicely, based on your precise language. Use the list modifier to add digits (3rd parameter) plus '.' from the second parameter.
Note this won't identify numbers that are not valid numbers (like the last one).
data have;
input @1 char_var $30.;
datalines;
1.234
4.15E7
UNDETECTED
-143.32
+144.12
79.32°F
14.14.14
;;;;
run;
data want;
set have;
if compress(char_Var,'.','d') ne ' ';
run;
Upvotes: 1