Tony Beans
Tony Beans

Reputation: 195

Specify empty values by character string in PROC IMPORT

I'm coming to SAS from R in which this problem is fairly easy to solve.

I'm trying to load a bunch of CanSim CSV files (one example table here) with a %Macro function.

%Macro ReadCSV (infile , outfile );
PROC IMPORT 
        DATAFILE= &infile.
        OUT= &outfile.
        DBMS=CSV REPLACE;
 GETNAMES=YES;
 DATAROW=2;
RUN;
%Mend ReadCSV;
%ReadCSV("\\DATA\CanSimTables\02820135-eng.csv", work.cs02820135);
%ReadCSV("\\DATA\CanSimTables\02820158-eng.csv", work.cs02820158);

The problem is that the numeric Value column has ".." in all the csv's whenever the value is missing. This is creating an error when IMPORT gets to the rows with this character string.

Is there some way to tell IMPORT that any ".." should be removed or treated as missing values? (I found forums referring to the DSD option, but that doesn't seem to help me here.)

Thanks!

Upvotes: 0

Views: 2191

Answers (2)

Joe
Joe

Reputation: 63424

Not explicitly relevant for this question, but - if your issue were "N" or "D" or similar that you wanted to become missing, there would be a somewhat easier solution: the missing statement (importantly distinct from the missing option).

missing M;

That tells SAS to see a single character M in the data as a missing value, and read it in accordingly. It would read it in as .M special missing value, which is functionally similar to . regular missing (but not actually equal in an equality statement).

Upvotes: 0

Tom
Tom

Reputation: 51611

PROC IMPORT can only guess at the structure of your data. For example it might see the .. and assume the column contains a character string instead of a number. It can also make other decisions that can made the generated dataset useless.

You will be better served to write you own data step code to read the file. It is not very difficult to do. For your example linked file all I did was copy and paste the first row of the CSV file and remove the commas, make the names valid variable names and take some guesses as to how long to make the character variables.

data want ;
  infile "&path/&fname" dsd truncover firstobs=2 ;
  length Ref_Date $7 GEO $100 Geographical_classification $20
         CHARACTERISTICS $100 STATISTICS DATATYPE $50 Vector Coordinate $20
         Value 8
  ;
  input (Ref_Date -- Value) (??) ;
run;

The ?? modifier will tell SAS not to report any errors when trying the convert the text in the VALUE column into a number. So the .. and other garbage in the file will generate missing values.

Upvotes: 3

Related Questions