Tom Dw
Tom Dw

Reputation: 53

input an array in SAS

I need to read multiple raw text files into a SAS-dataset. Each file consists several ingredients as shown in the example files below. Each file (a dish) lists all the ingredients on one line, separated by a comma. The amount of ingredients is variable. Some example files (dishes):

Example file 1 (dish1.csv):

Tomate, Cheese, Ham, Bread

Example file 2 (dish2.csv):

Sugar, Apple

Example file 3 (dish3.csv):

Milk, Sugar, Cacao

Because I have about 250 files (dishes) I created a macro program to read those files. That way I can execute this macro in another macro to read all the dishes I need. The program looks like this:

%readDish (dishNumber);
    data newDish;
        * Find and read the csv-file;
        infile "my_file_location/dish&dishNumber..csv" dlm=";" missover;

        * Read up to 25 ingredients;
        input ingredient1-ingredient25 : $25.;

        * Put all ingredients in an array;
        array ingredients{25} ingredient1-ingredient25;

        * Loop thrue all the ingredients and output;
        do i=1 to dim(ingredients);
            dishNumber = &dishNumber;
            ingredient = ingredients{i};
            output;
        end;
    run;
%mend;

Is it possible to create a SAS (macro) program that is able to read all dishes, no matter how many ingredients I have? The SAS table should look like this:

1 Tomate
1 Cheese
1 Ham
1 Bread

Upvotes: 0

Views: 809

Answers (1)

Joe
Joe

Reputation: 63424

Seems straightforward to me: read the data in vertically, then if you need it horizontal, add a transpose step afterwards. You don't have to read in a whole line in one step - the @@ operator tells SAS to keep the line pointer on that line, so you just read in the one.

data dishes;
  length _file $1024 
         ingredient $128;
  infile "c:\temp\dish*.csv" dlm=',' filename=_file lrecl=32767; *or whatever your LRECL needs to be;
  input ingredient $ @@;
  dishnumber = input(compress(scan(_file,-2,'\.'),,'kd'),12.);
  output;
run;

Here I use a wildcard to read them all in - you can of course us a macro with similar code if you need to, though wildcard or a concatenated filename is probably easier. The way I get dishnumber might not always work depending on the filename construction, but some form of that should be usable.

To expand on why this works: The way the datastep works in SAS is that it is a constant loop, looping over the code repeatedly until it encounters an "end condition". End conditions are, most commonly, the stop keyword, and then any attempt to read from a SET or INFILE where no further read is possible (i.e., you read a 100 line SAS dataset, and it tries to read row 101 in, fails, so ends the data step). However, other than that, it will keep doing the same code until it gets there. It just does some cleanup at the "run" point to make sure it is not infinitely looping.

In the case of input from infiles, usually SAS reads a line, then at the RUN, it will skip forward to the next EOL (end of line, usually a carriage return and linefeed in Windows) if it's not already at one. Sometimes that is useful - perhaps, usually. But, in some cases you'd rather ask SAS to keep reading the same line.

In comes the @@ operator. @@ says "do not advance to EOL even if you hit RUN". (@ says "Do not advance to EOL except when you hit RUN" - normally input itself causes SAS to read until EOL.) Thus, when you perform the next data step iteration, the input pointer will be in the same exact place you left it - right after the previous field you read in.

This was highly useful in the 60s and 70s, when punchcards were the trendy new thing, and you would put lines of input often without regard to any line organization - in particular, if you input just one variable per row, at 8 columns per input variable, you're not wasting 72 blocks from one punchcard - so, you have input just like your ingredients: many pieces of data per row on the input, which then want to be translated into one piece of data per row in memory. While it's not as common nowadays to store data this way, this is certainly possible - as your data exemplify.

Upvotes: 1

Related Questions