Erratic Assassin
Erratic Assassin

Reputation: 33

SAS - Extracting observations from a character variable with characters OTHER than the numeric digits or "."?

I'm trying to figure out a small coding challenge here.

I have a variable, RESULT, that is a character variable, but needs to be converted to numeric. Most of the results are regular numbers, i.e. "90", "90.0", "55.42", etc. However, there are a lot of weird results, such as "UNDETECTABLE" or "1.29E7" or such results.

What I want to do is extract all the observations that have a character OTHER than the numeric digits OR the value "." (i.e. a period). Then I can manually assign those values.

I have a very large dataset, but limited computing power, so I can't scroll through and pick out the odd observations with special characters. It just freezes up my computer and takes way too long.

Thoughts on how to best accomplish this? Is there a SAS function that works for such a task? I've thought about the compress function, but I need to make sure I'm not missing any observations with special characters (i.e. characters other than numbers and period).

Thank you!

Upvotes: 2

Views: 494

Answers (3)

Tom
Tom

Reputation: 51621

Couldn't you just get the distinct set of non-number values? That should be smaller than taking every observation that has a non-number value.

One way to test for a valid number is let SAS do it for you. The INPUT() function can convert text strings to numbers. If you use the COMMA informat then in addition to properly converting scientific notation values like 1.29E7, which is just 12900000, it can also handle values with commas or dollar signs.

 proc sql ;
    create table want as
      select distinct result
      from have
      where result not in (' ','.') 
        and input(result,comma32.)=.
    ;
quit;

That should find values like "UNDETECTABLE", but treat values like "90", "90.0", "55.42", "1.29E7", or "12,345" as valid numbers.

Upvotes: 0

burduk
burduk

Reputation: 152

Try this:

data out; /*output dataset*/
set in;  /*input dataset*/
result = trim(result);  /*trailing blanks - not have to be a problem*/
clear_number = compress(result, '.'); /*remove period from result*/
/*then, clear_number have to have only digits, so:*/
if notdigit(clear_number) then delete;
/*but, maybe, result have more then one period?*/
if count(result, '.') > 1 then delete;
result_numeric = result*1; /*lazy convertion*/
run;

Upvotes: 0

Joe
Joe

Reputation: 63434

COMPRESS will handle this for you nicely, based on your precise language. Use the list modifier to add digits (3rd parameter) plus '.' from the second parameter.

Note this won't identify numbers that are not valid numbers (like the last one).

data have;
  input @1 char_var $30.;
datalines;
1.234
4.15E7
UNDETECTED
-143.32
+144.12
79.32°F
14.14.14
;;;;
run;

data want;
  set have;
  if compress(char_Var,'.','d') ne ' ';
run;

Upvotes: 1

Related Questions