astel
astel

Reputation: 192

Read multiple txt files from different folders into SAS dataset

I have the following problem, and I don't really know where to begin. I have a folder called "ALL" and inside that folder there are sub-folders with titles equal to the date they were created in the format DD-MM-YYYY. There is a folder for every day, ie no missing days. Inside each of those folders there are numerous txt files. I would like to read one of these text files from each of the date folders. That file will have a naming convention of "thedata_" followed by a random series of numbers.

So for example, if there are 3 date folders in the ALL folder, then I would like to read 3 separate "thedata_" text files into 1 final SAS file. And subsequently each day a new folder is added, I want to append the "thedata_" file from that folder to the existing SAS file rather than rerun the script from scratch.

Upvotes: 0

Views: 1253

Answers (1)

Stu Sztukowski
Stu Sztukowski

Reputation: 12909

Here's one solution. This uses SAS functions to read and populate a dataset that reads every file in every folder so that you do not need to turn on x commands. You can save each one to a macro variable, then loop through and read each file however you'd like. You can modify this to work with the filevar option.

filename all "Directory/ALL";

data myfiles;
    length folder_name 
           file_name  
           file
           folder_path $5000.
    ;

    /* Folder delimiter */
    if("&sysscp." = "WIN") then SLASH = '\';
        else SLASH = '/';

    /* Open the ALL directory */
    did = dopen("all");

    /* If it was successful, continue */
    if(did) then do;  

        /* Iterate through all subfolders in ALL */
        do i = 1 to dnum(did);

            /* Get the subfolder name and full path */
            folder_name = dread(did, i);
            folder_path = cats(pathname('all'), SLASH, folder_name);

            /* Assign a filename statement to the subfolder */
            rc = filename('sub', folder_path);
            
            /* Give the sub-folder a a directory ID */
            did2 = dopen('sub');

            /* Open the subfolder and read all the .txt files within it */
            if(did2) then do;
                do j = 1 to dnum(did2);

                    file_name = dread(did2, j);
                    file_ext  = scan(file_name, -1, '.');
                    file      = cats(folder_path, SLASH, file_name);
                    
                    /* Save file name only if the expected value is found */
                    if(upcase(file_name) =: "THEDATA_" AND upcase(file_ext) = "TXT") then do;
                        nfiles+1;
                        call symputx(cats('file', nfiles), file); /* Save each file to a macro variable named file1, file2, etc. */
                        output;
                    end;
                end;
            end;

            /* Close the subfolder and move on to the next one */
            rc = dclose(did2);
        end;

    end;

    rc = dclose(did);

    /* Save the total number of files we found to a macro variable */
    call symputx('nFiles', nFiles);

    keep file file_name folder_name folder_path;
run;

/* Read all the files */
%macro readFiles;
    %do i = 1 %to &nFiles.;
        proc import 
            file = "&&file&i."
            out  =  _thedata_&i.
            dbms =  csv
            replace;
            guessingrows=max;
        run;
    %end;

    /* Put all the files together */
    data thedata;
        set _thedata_:;
    run;

    proc datasets lib=work nolist;
        delete _thedata_:;
    quit;
%mend;
%readFiles;

Upvotes: 1

Related Questions