Reputation: 81
I have the following problem. There are six months of transaction data per entity in a table like this:
customer_1 1000 January
customer_1 1000 January
customer_1 1000 February
customer_1 1000 February
customer_1 1000 March
customer_1 1000 April
customer_1 1000 May
customer_1 1000 June
I'd like to count the number of transactions in every two months like:
In January - February there are 4 number of transactions. In February - March there are 3 number of transactions and so on.
I tried it with nested loops, so that I define a begin and end date and then at the end of the loop I add 1 month to both and iterate again. Something like this:
do i = n to 1 by -1 while (date_key{i} >= begin_date);
do k = i to 1 by -1 while(date_key{k} >= b_d and date_key{k} <= intnx('Month',b_d,2, 's'));
if upcase(account{k}) in (&account_type) and
then do;
total_amount = sum(total_amount,currency_amount{k});
amt_cnt = amt_cnt + 1;
end;
end;
if total_amount >= threshold_total then m_cnt = sum(m_cnt,1);
b_d = intnx('Month',b_d,1,'s');
end;
Upvotes: 0
Views: 82
Reputation: 51566
You should probably just use a MULTILABEL format and a proc that supports them.
First let's create some actual sample input data.
data have ;
input id :$10. amount date :yymmdd.;
format date yymmdd.;
cards;
customer_1 1001 2021-01-01
customer_1 1002 2021-01-01
customer_1 1003 2021-02-01
customer_1 1004 2021-02-01
customer_1 1005 2021-03-01
customer_1 1006 2021-04-01
customer_1 1007 2021-05-01
customer_1 1008 2021-06-01
;
Then let's create an example format. Here is a format named PERIODS that are the first 5 two month periods in 2021.
proc format cntlout=formats;
value periods (MULTILABEL)
'01JAN2021'd -< '01MAR2021'd = '2021 01-02'
'01FEB2021'd -< '01APR2021'd = '2021 02-03'
'01MAR2021'd -< '01MAY2021'd = '2021 03-04'
'01APR2021'd -< '01JUN2021'd = '2021 04-05'
'01MAY2021'd -< '01JUL2021'd = '2021 05-06'
;
run;
Once you have the format defined apply it to your DATE variable and summarize by date.
Here is code using PROC SUMMARY.
proc summary data=have nway;
class date / mlf ;
var amount;
output out=want(rename=(date=period)) sum= ;
format date periods.;
run;
proc print data=want;
run;
And the results:
It is not hard to make format dynamically for a range of dates. You just need these variables in a dataset to use with the CNTLIN= option of PROC FORMAT.
So perhaps something like this:
data cntlin;
fmtname='PERIODS';
sexcl='N';
eexcl='Y';
hlo='SM';
length label $20 ;
do offset=0 to intck('month','01JAN2021'd,'01JAN2022'd)-1;
start = intnx('month','01JAN2021'd,offset);
end = intnx('month',start,2);
label=cat(put(start,yymmd7.),' and ',put(end-1,yymmd7.));
output;
end;
run;
Upvotes: 1