craze
craze

Reputation: 81

Iterate through months by twos in SAS

I have the following problem. There are six months of transaction data per entity in a table like this:

customer_1  1000  January
customer_1  1000  January
customer_1  1000  February
customer_1  1000  February
customer_1  1000  March
customer_1  1000  April
customer_1  1000  May
customer_1  1000  June

I'd like to count the number of transactions in every two months like:

In January - February there are 4 number of transactions. In February - March there are 3 number of transactions and so on.

I tried it with nested loops, so that I define a begin and end date and then at the end of the loop I add 1 month to both and iterate again. Something like this:

 do i = n to 1 by -1 while (date_key{i} >= begin_date);

        do k = i to 1 by -1 while(date_key{k} >= b_d and date_key{k} <= intnx('Month',b_d,2, 's'));

            if upcase(account{k}) in (&account_type) and
            then do;
                total_amount = sum(total_amount,currency_amount{k});
                amt_cnt = amt_cnt + 1;
            end;
        end;
        if total_amount >= threshold_total then m_cnt = sum(m_cnt,1);
        b_d = intnx('Month',b_d,1,'s');
 end;

Upvotes: 0

Views: 82

Answers (1)

Tom
Tom

Reputation: 51566

You should probably just use a MULTILABEL format and a proc that supports them.

First let's create some actual sample input data.

data have ;
  input id :$10. amount date :yymmdd.;
  format date yymmdd.;
cards;
customer_1  1001  2021-01-01
customer_1  1002  2021-01-01
customer_1  1003  2021-02-01
customer_1  1004  2021-02-01
customer_1  1005  2021-03-01
customer_1  1006  2021-04-01
customer_1  1007  2021-05-01
customer_1  1008  2021-06-01
;

Then let's create an example format. Here is a format named PERIODS that are the first 5 two month periods in 2021.

proc format cntlout=formats;
 value periods (MULTILABEL)
   '01JAN2021'd -< '01MAR2021'd = '2021 01-02'
   '01FEB2021'd -< '01APR2021'd = '2021 02-03'
   '01MAR2021'd -< '01MAY2021'd = '2021 03-04'
   '01APR2021'd -< '01JUN2021'd = '2021 04-05'
   '01MAY2021'd -< '01JUL2021'd = '2021 05-06'
 ;
run;

Once you have the format defined apply it to your DATE variable and summarize by date.

Here is code using PROC SUMMARY.

proc summary data=have nway;
  class date / mlf ;
  var amount;
  output out=want(rename=(date=period)) sum= ;
  format date periods.;
run;
proc print data=want;
run;

And the results:

enter image description here

It is not hard to make format dynamically for a range of dates. You just need these variables in a dataset to use with the CNTLIN= option of PROC FORMAT.

enter image description here

So perhaps something like this:

data cntlin;
  fmtname='PERIODS';
  sexcl='N';
  eexcl='Y';
  hlo='SM';
  length label $20 ;
  do offset=0 to intck('month','01JAN2021'd,'01JAN2022'd)-1;
    start = intnx('month','01JAN2021'd,offset);
    end = intnx('month',start,2);
    label=cat(put(start,yymmd7.),' and ',put(end-1,yymmd7.));
    output;
  end;
run;

Upvotes: 1

Related Questions