kira
kira

Reputation: 286

Check if sas datasets are sorted

I have a folder which has 100 sas data sets. I need to create a new table which has two columns (Name of data set, sorted/not sorted).

I know proc contents can be used to check the condition of sort, but is there a way to do it programatically for all 100 data sets and return the results in a single table?

Upvotes: 0

Views: 4038

Answers (2)

user667489
user667489

Reputation: 9569

Proc contents / dictionary.tables do not always tell the full story. It's sometimes possible for datasets to be in a sorted state without the metadata reflecting this, e.g. if they are generated in a sorted order without having been sorted, or as subsets of already sorted datasets. I wrote a little example to demonstrate how this can happen:

proc datasets lib = work memtype = data kill;
    run;
quit;

data  example                     /*False negative - sorted by name, but no sort metadata present*/
      example2(sortedby = weight) /*Flagged as W - false positive based on user-supplied metadata*/
      example3(sortedby = name)   /*Flagged as W - correct value based on user-supplied metadata*/
      example4                    /*Flagged as S - sorted*/
      example5                    /*Flagged as SK - sorted with no duplicate key values*/
      example6                    /*Flagged as SR - sorted with no duplicate records*/
      example7;                   /*Flagged as SR - sorted with no duplicate records*/
  set sashelp.class;
run;

proc sort data = example4;
  by name;
run;

proc sort data = example5 nodupkey;
  by name;
run;

proc sort data = example6 noduprecs;
  by sex;
run;

proc sort data = example7 nodup;
  by sex;
run;

proc sql;
  create table sorted_tables as
    select libname, memname, sorttype
    from dictionary.tables
    where libname = 'WORK'
   ;
quit;

If you wanted to be certain, one approach would be to attempt a full read of each dataset using by-group processing - an error will then show up as a non-zero value in the automatic macro variable &SYSERR if the dataset isn't sorted:

data _null_;
    set example(keep = name);
    by sex;
run;

%put &SYSERR; /*Returns 1012 - not sorted*/

data _null_;
    set example(keep = name);
    by name;
run;

%put &SYSERR; /*Returns 0 - sorted*/

This is obviously going to be quite slow, and it will also return false positives for datasets that are indexed but not sorted, but functionally there's little difference for most purposes.

Upvotes: 0

Joe
Joe

Reputation: 63424

Using DICTIONARY.TABLES, this is fairly simple.

proc sql;
  create table sorted_tables as
    select memname, sorttype
    from dictionary.tables
    where libname='MYLIB'
  ;
quit;

This is basically the programmatic version of PROC CONTENTS, and gives you access to the same information. See Kirk Lafler's excellent paper on the subject if you need more information.

"S" means sorted, "SK" means sorted-nodupkey. There may be other values, I'm not sure of all of them.

Upvotes: 5

Related Questions