Lorem Ipsum
Lorem Ipsum

Reputation: 4524

PROC TABULATE Rearrange header subgroups independent of the data

This is a follow up to a previous question which considered only a single variable with two subgroups. In that question, the solution was to order the subgroups by sorting the data.

That approach, however, breaks down when there are several variables. It becomes a whack-a-mole of sorting the variables into a sequence which produces the desired result. Each subsequent BY variable subgroup depends on the previous levels. If not all subgroups are present in a higher level group, then not all subgroups can be arranged into the desired order.

Either PROC TABULATE is not the appropriate tool for this task (i.e. obtaining percents of various groups across several variables) or there is a technique that allows for subgroups to be arranged independently of the data.

Do I go digging for the needle in the documentation haystack or do I reinvent the wheel? Any insights you could give me would be appreciated.


Example:

To give an illustration, say that I want to create a table with the subgroups of each variable arranged in (Y, N) order. Notice how var4 is not output in the correct order. By the time the other variables have been sorted, there aren't enough var4 values present to be sorted into the desired order.

data example;
    input group $ var1 $ var2 $ var3 $ var4 $;
    datalines;
    1 Y Y N Y
    1 N Y N N
    2 Y N Y N
    2 Y Y Y N
    3 N N N Y
    3 N Y Y N
    ;
run;

proc sort data = example out = sorted;
    by  descending var1
        descending var2
        descending var3
        descending var4
        ;
run;

title 'Percent';
proc tabulate data = sorted order = data;
    class group var1 var2 var3 var4;
    table group='Group', 
            all  = 'Total'*pctn='' 
            var1 = 'Variable 1'*pctn=''
            var2 = 'Variable 2'*pctn=''
            var3 = 'Variable 3'*pctn=''
            var4 = 'Variable 4'*pctn='';
run;

Program Output

It may be possible to devise a combination of BY variables in the PROC SORT which give a (Y, N) subgrouping order, but it would involve a bunch of fiddling which is not robust against changes in data. If the table needs to be updated monthly, then each month you would have to fiddle with the sorting.

Upvotes: 0

Views: 457

Answers (1)

user667489
user667489

Reputation: 9569

No workaround or reinventing the wheel required - this is exactly what classdata datasets are for:

data example;
    input group (var1-var4) ($1. +1);
    datalines;
    1 Y Y N Y
    1 N Y N N
    2 Y N Y N
    2 Y Y Y N
    3 N N N Y
    3 N Y Y N
    ;
run;

data classtypes;
    do group = 1 to 3;
        do var1 = 'Y','N';
            do var2 = 'Y','N';      
                do var3 = 'Y','N';                  
                    do var4 = 'Y','N';
                        output;
                    end;
                end;
            end;
        end;            
    end;
run;

title 'Percent';
proc tabulate data = example order = data classdata=classtypes;
    class group var1 var2 var3 var4;
    table group='Group', 
            all  = 'Total'*pctn='' 
            var1 = 'Variable 1'*pctn=''
            var2 = 'Variable 2'*pctn=''
            var3 = 'Variable 3'*pctn=''
            var4 = 'Variable 4'*pctn='';
run;

As a bonus, this also avoids having to sort your main input dataset - row/column order in the output table is determined by the order of the classdata dataset.

Upvotes: 1

Related Questions