Wolff
Wolff

Reputation: 1091

How to pad a number with leading zero in a SAS Macro loop counter?

So I have a range of datasets in a specific library. These datasets are named in the format DATASET_YYYYMM, with one dataset for each month. I am trying to append a range of these datasets based on user input for the date range. i.e. If start_date is 01NOV2019 and the end_date is 31JAN2020, I want to append the three datasets: LIBRARY.DATASET_201911, LIBRARY.DATASET_201912 and LIBRARY.DATASET_202001.

The range is obviously variable, so I can't simply name the datasets manually in a set function. Since I need to loop through the years and months in the date range, I believe a macro is the best way to do this. I'm using a loop within the SET statement to append all the datasets. I have copied my example code below. It does work in theory. But in practice, only if we are looping over the months of November and December. As the format of the dataset name has a two digit month, for Jan-Sept it will be 01-09. The month function returns 1-9 however, and of course a 'File DATASET_NAME does not exist' error is thrown. Problem is I cannot figure out a way to get it to interpret the month with leading 0, without ruining functionality of another part of the loop/macro.

I have tried numerous approaches to format the number as z2, cannot get any to work. i.e. Including PUTN functions in the DO line for quote_month as follows, it ignores the leading zero when generating the dataset name in the line below.

%DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,%SYSFUNC(PUTN(&start_month.,z2.)),1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,%SYSFUNC(PUTN(&end_month.,z2.)),12,.));

Below is example code (without any attempt to reformat it to z2) - it will throw an error because it cannot find 'dataset_20201' because it is actually called 'dataset_202001'. The dataset called dataset_combined_example produces the desired output of the code by manually referencing the dataset names which it will be unable to do in practice. Does anyone know how to go about this?

DATA _NULL_;
    FORMAT start_date end_date DATE9.;
    start_date = '01NOV2019'd;
    end_date = '31JAN2020'd;
    CALL symput('start_date',start_date);
    CALL symput('end_date',end_date);
RUN;

DATA dataset_201911;
   input name $;
   datalines;
Nov1
Nov2
;
RUN;
DATA dataset_201912;
   input name $;
   datalines;
Dec1
Dec2
;
RUN;
DATA dataset_202001;
   input name $;
   datalines;
Jan1
Jan2
;
RUN;

DATA dataset_combined_example;
    SET dataset_201911 dataset_201912 dataset_202001;
RUN;


%MACRO get_table(start_date, end_date);
    %LET start_year = %SYSFUNC(year(&start_date.));
    %LET end_year = %SYSFUNC(year(&end_date.));
    %LET start_month = %SYSFUNC(month(&start_date.));
    %LET end_month = %SYSFUNC(month(&end_date.));
    DATA dataset_combined;
        SET
            %DO quote_year = &start_year. %TO &end_year.;
                %DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,&start_month.,1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,&end_month.,12,.));
                    dataset_&quote_year.&quote_month.
                %END;
            %END;
        ;
    RUN;
%MEND;
%get_table(&start_date.,&end_date.);

Upvotes: 1

Views: 2172

Answers (4)

Tom
Tom

Reputation: 51566

You can use the Z format to generate strings with leading zeros.

But your problem is much easier if you use SAS date functions and formats to generate the YYYYMM strings. Just use a normal iterative %DO loop to cycle the month offset from zero to the number of months between the two dates.

%macro get_table(start_date, end_date);
%local offset dsname ;
data dataset_combined;
  set
%do offset=0 %to %sysfunc(intck(month,&start_date,&end_date));
  %let dsname=dataset_%sysfunc(intnx(month,&start_date,&offset),yymmn6);
  &dsname.
%end;
  ;
run;
%mend get_table;

Result:

445   options mprint;
446   %get_table(start_date='01NOV2019'd,end_date='31JAN2020'd);
MPRINT(GET_TABLE):   data dataset_combined;
MPRINT(GET_TABLE):   set dataset_201911 dataset_201912 dataset_202001 ;
MPRINT(GET_TABLE):   run;

Upvotes: 1

Richard
Richard

Reputation: 27498

In a macro

  • Use INTNX to compute the bounds for a loop over date values. Within the loop:
    • Compute the candidate data set name according to specified lib, prefix and desired date value format. <yyyy><mm> is output by format yymmn6.
    • Use EXIST to check candidate data sets for existence.
      • Alternatively, do not check, but make sure to set OPTIONS NODSNFERR prior to combining. The setting will prevent errors when specifying a non-existent data set.
    • Update the loop index to the end of the month so the next increment takes the index to the start of the next month.

%macro names_by_month(lib=work, prefix=data_, start_date=today(), end_date=today(), format=yymmn6.);

  %local index name;

  %* loop over first-of-the-month date values;
  %do index = %sysfunc(intnx(month, &start_date, 0)) %to %sysfunc(intnx(month, &end_date, 0));

    %* compute month dependent name;
    %let name = &lib..&prefix.%sysfunc(putn(&index,&format));

    %* emit name if it exists;
    %if %sysfunc(exist(&name)) or %sysfunc(exist(&name,VIEW)) %then %str(&name);

    %* prepare index for loop +1 increment so it goes to start of next month;
    %let index = %sysfunc(intnx(month, &index, 0, E));
  %end;  

%mend;

* example usage:

data combined_imports(label="nov2019 to jan2020");
  set 
    %names_by_month(
      prefix=import_,
      start_date='01NOV2019'd,
      end_date = '31JAN2020'd
    )
  ;
run;

Upvotes: 0

user667489
user667489

Reputation: 9569

You can also do this using the metadata tables without having to resort to macro loops in the first place:

/* A few datasets to combine */
data 
  DATASET_201910
  DATASET_201911
  DATASET_201912
  DATASET_202001
  ;
run;

%let START_DATE = '01dec2019'd;
%let END_DATE = '31jan2020'd; 

proc sql noprint;
  select catx('.', libname, memname) into :DS_LIST separated by ' '
  from dictionary.tables
  where 
    &START_DATE <= 
      case 
        when prxmatch('/DATASET_\d{6}/', memname) 
          then input(scan(memname, -1, '_'), yymmn6.)
        else -99999 
      end 
    <= &END_DATE 
    and libname = 'WORK'
  ;
quit;

data combined_datasets /view=combined_datasets;
  set &DS_LIST; 
run;

The case-when in the where clause ensures that any other datasets present in the same library that don't match the expected naming scheme are ignored.

One key difference with this approach is that you will never end up attempting to read a dataset that doesn't exist if one of the expected datasets in your range is missing.

Upvotes: 1

Llex
Llex

Reputation: 1770

You could do this using putn and z2. format.

%DO quote_year = &start_year. %TO &end_year.;
     %DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,&start_month.,1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,&end_month.,12,.));
            dataset_&quote_year.%sysfunc(putn(&quote_month.,z2.))
     %END;
%END;

Upvotes: 3

Related Questions