Quinn
Quinn

Reputation: 1

Convert the missing data from '?' to '.' to the entire dataset

The missing ? may be transformed to a . symbol to treat as a numeric value with 'INPUT'function.

Previous I'm using this code to change the data one by one but now i wish to change all the ? values to '.' with the same code i am using to the entire dataset.

DATA STD;
    SET  DML.RISKS;
STD_= INPUT (STDs, 32.);
KEEP STDs STD_;
RUN;

I want to apply the same function to the entire dataset.
How can I do this?

Upvotes: 0

Views: 203

Answers (1)

Kermit
Kermit

Reputation: 3117

I am not quite sure I understand what you are trying to achieve but I'll try to give you some directions.


Let's consider this example dataset of two character variables (STD1, STD2) and one numeric (STD3).

data have;
input STD1 $ STD2 $ STD3;
cards;
ABC DJI 1
? UJD 2
;

I want to apply the same function to the entire dataset

To perform an operation on all numeric or character variables in a SAS dataset, you can use arrays.

data want;
    set have;
    array num _numeric_;
    array char $ _character_;
    
    /* multiply by 10 all numeric variables*/
    do i=1 to dim(num);
        num(i)=num(i)*10;
    end;

    /* convert ? to . for character variables */
    do i=1 to dim(char);
        if char(i) = '?' then char(i) = '.';
    end;
    drop i;
run;
std1 std2 std3
abc  dji   10
 .   ujd   20

However it seems what you are trying to do is to convert all character variables to numeric.
It is a very solved problem and there are many ways to achieve this, but I will give you one possible solution.

/* list all variable names and their type from the TEST data set. */
proc contents data=have out=vars(keep=name type) noprint;
run; 
 
/* subset the data set to keep only the character.                                              */
/* A new list of numeric variable names is created from the character variable name with a "_n" */
/* appended to the end of each name.                                                            */                                                        

data vars;                                                
   set vars;                                                 
   if type=2;                               
   newname=trim(left(name))||"_n"; 
run;                                                                                                                 

/* char will contain a list of each character variable separated by a blank space */
/* num will contains a list of each numeric variable separated by a blank space   */
/* rename will contain a list of each new numeric  */
/* variable and each character variable separated by an equal sign to be used in  */ 
/* the RENAME statement.                                                          */                                                        

proc sql noprint;                                         
   select trim(left(name)), trim(left(newname)),             
          trim(left(newname))||'='||trim(left(name))         
          into :char separated by ' ', :num separated by ' ',  
          :rename separated by ' '                         
          from vars;                                                
quit;                                                                                                                                                                     

/* Convert the numeric values to character using an array and the input function */
/* DROP statement is used to prevent the character variables from being written  */
/* to the output data set, and the RENAME statement is used to rename the new    */
/* numeric variable names back to the original character variable names.         */ 

data want;                                               
   set have;                                                 
   array char(*) $ &char;                                    
   array num(*) #                                      
   do i = 1 to dim(char);                                      
      num(i)=input(char(i), 32.);                                  
   end;                                                      
   drop i &char;                                           
   rename &rename;                                                                                      
run;  
std1 std2 std3
 .    .    1
 .    .    2

Note that the character columns were all set to missing numeric values (.).

Last, I'll leave you with this: if your purpose was to convert all the character variables to numeric, the real question is why were they character in the first place? Perhaps you can take control of the import early enough so you don't have to make those adjustments.

Upvotes: 1

Related Questions