Looping with PROC SQL

Question

I have a dataset that looks like this (1 to 5 are months):

date acct seg_1 seg_2 seg_3 seg_4 seg_5 
1/20  A     x     x     x     x     y
1/20  B     x     y     x     x     y
1/20  C     y     y     x     x     x


date acct abc_1 abc_2 abc_3 abc_4 abc_5 
1/20  A     0     0     1     1     1
1/20  B     1     1     0     1     1
1/20  C     1     0     1     0     1

The idea is that I want to count the number of accounts if it satisfies the condition of abc_(t)=0 and abc_(t+1)=1 for each segment columns. Below is my code that gives me the results I wanted without looping:

proc sql;
    create table want_1 as
    select distinct seg_1 as segment, (count(acct)) as count_1
    from have
    where abc_2 = 0 and abc_3 = 1
    group by seg_1;

    create table want_2 as
    select distinct seg_2 as segment, (count(acct)) as count_2
    from have
    where abc_3 = 0 and abc_4 = 1
    group by seg_2;

    create table want_3 as
    select distinct seg_3 as segment, (count(acct)) as count_3
    from have
    where abc_4 = 0 and abc_5 = 1
    group by seg_3;
quit;

However, I would like to embed a macro as I have 84 months ie. code to run and to combine all months after. Would appreciate the help on how to fix my failed code below:

%macro loop(a,b);
    proc sql;
        %do x=&a. %to &b.;
            %do i=&a.+1 %to &b.+1;
                %do j=&a.+2 %to &b.+2;
                    create table want_&x. as
                    select distinct seg_&x. as segment, count(acct) as count_&x.
                    from have
                    where abc_&i. = 0 and abc_&j. = 1
                    group by seg_&x.;
                %end;
            %end;
        %end;
    quit;
%mend;
%loop(a=1,b=84);

Ideally, the combined results (using Segment as the unique identifier) should look like this:

Segment  Count_1 Count_2 Count_3
   x        1       0       1
   y        1       1       0

Note: tried transposing my data but it has over 41 million rows. Appreciate if someone could suggest a data step code as an alternative too!

Tom · Accepted Answer

It is still not at all clear what your algorithm is. So let's take a shot at it and see if this is what you mean. First let's convert your pasted listing into actual data.

data have ;
  input date $ actt $ seg1-seg5  abc1-abc5;
cards;
1/20 A X X X X Y 0 0 1 1 1
1/20 B X Y X X Y 1 1 0 1 1
1/20 C Y Y X X X 1 0 1 0 1
;

So there ate 5 months here. Since it looks like for month 3 you need to look at SEGMENT_3, ABC_4 and ABC_5 then you will get N-2 fewer months out than you have in the data. Let's convert this to a tall format. We can use a view so that we don't need to permanently store the vertical dataset.

data step1 / view=step1 ;
  set have ;
  array seg [5];
  array abc [5];
  do month=1 to dim(seg)-2;
   segment=seg[month];
   current=abc[month+1];
   next=abc[month+2];
   count_me=current=0 and next=1;
   output;
  end;
  keep date actt month segment current next count_me;
run;

Now we can add up how many COUNT_ME observations there are per SEGMENT*MONTH. For example by using PROC SQL.

proc sql ;
 create table step2 as
 select segment,month
     , sum(count_me) as method1
 from step1
 group by segment,month
;
quit;

Then to get a dataset in the format you show we just need to transpose that.

proc transpose data=step2 prefix=count_ out=want(drop=_name_);
  by segment ;
  id month;
  var method1 ;
run;

Results:

Obs    segment    count_1    count_2    count_3

 1        X          1          0          1
 2        Y          1          1          .

Notice how there is no value for COUNT_3 for SEGMENT=Y, since Y never appeared in SEG3 in the sample input.

Looping with PROC SQL

Answers (2)

Related Questions