Kate McDonald
Kate McDonald

Reputation: 73

Output variables within a by group only if

My first time posting. I'm pretty new to SAS programming (actually all programming). This seems like a simple problem but can't figure it out. I have some crosstab output and I'm trying to get it into shape for easy output to tables. I want to retain the first observation in a by group if there are only 3 observations in that group. If there are more than 3 observations, I want to retain all observations but the last. So, for example, here's what I have:

Group1  Group2  Percent
var1    1   0.25
var1    1   0.75
var1    1   1
var1    2   0.4
var1    2   0.6
var1    2   1
var1    3   0.7
var1    3   0.3
var1    3   0.6
var2    1   0.1
var2    1   0.2
var2    1   0.4
var2    1   0.3
var2    1   1
var2    2   0.2
var2    2   0.2
var2    2   0.2
var2    2   0.2
var2    2   1
var2    3   0.7
var2    3   0.1
var2    3   0.05
var2    3   0.05
var2    3   0.1

and here's what I want in a new dataset

Group1  Group2  Percent
var1    1   0.25
var1    2   0.4
var1    3   0.7
var2    1   0.1
var2    1   0.2
var2    1   0.4
var2    1   0.3
var2    2   0.2
var2    2   0.2
var2    2   0.2
var2    2   0.2
var2    3   0.7
var2    3   0.1
var2    3   0.05
var2    3   0.05

Hopefully, that's clear but please let me know if more information is needed.

Upvotes: 0

Views: 155

Answers (1)

kstats9pt3
kstats9pt3

Reputation: 873

I've broken it out in a few steps to help you see the logic and have used both data steps and SQL. Basically you want to count how many are in each group and keep all counts (the count within the group and the total count) around so you can use them to make your final logic.

data test;
    length GROUP1 $5 GROUP2 PERCENT 8;
    input GROUP1 $ GROUP2 PERCENT;
    datalines;
    var1    1   0.25
    var1    1   0.75
    var1    1   1
    var1    2   0.4
    var1    2   0.6
    var1    2   1
    var1    3   0.7
    var1    3   0.3
    var1    3   0.6
    var2    1   0.1
    var2    1   0.2
    var2    1   0.4
    var2    1   0.3
    var2    1   1
    var2    2   0.2
    var2    2   0.2
    var2    2   0.2
    var2    2   0.2
    var2    2   1
    var2    3   0.7
    var2    3   0.1
    var2    3   0.05
    var2    3   0.05
    var2    3   0.1 
    ;
run;

** count the number of obs per group **;
data test_ct; set test;
    by GROUP1 GROUP2;
    COUNT + 1;
    if first.GROUP2 then COUNT = 1;
run;

** count the total number of obs per group and output on each row **;
proc sql noprint;
    create table test_ct_all as
    select *, count(*) as COUNT_TOTAL
    from test_ct group by GROUP1,GROUP2
    order by GROUP1, GROUP2, COUNT;
quit;

** logic to keep records **;
data keep_flags; set test_ct_all;
    if COUNT=1 and COUNT_TOTAL=3 then KEEP=1;

    *the last record will have COUNT and COUNT_TOTAL equal;
    if COUNT_TOTAL > 3 and (COUNT_TOTAL ne COUNT) then KEEP=1;
run;

** output only the keep records **;
data keepers; set keep_flags;
    if KEEP=1;
run;

Upvotes: 1

Related Questions