Reputation: 73
My first time posting. I'm pretty new to SAS programming (actually all programming). This seems like a simple problem but can't figure it out. I have some crosstab output and I'm trying to get it into shape for easy output to tables. I want to retain the first observation in a by group if there are only 3 observations in that group. If there are more than 3 observations, I want to retain all observations but the last. So, for example, here's what I have:
Group1 Group2 Percent
var1 1 0.25
var1 1 0.75
var1 1 1
var1 2 0.4
var1 2 0.6
var1 2 1
var1 3 0.7
var1 3 0.3
var1 3 0.6
var2 1 0.1
var2 1 0.2
var2 1 0.4
var2 1 0.3
var2 1 1
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 1
var2 3 0.7
var2 3 0.1
var2 3 0.05
var2 3 0.05
var2 3 0.1
and here's what I want in a new dataset
Group1 Group2 Percent
var1 1 0.25
var1 2 0.4
var1 3 0.7
var2 1 0.1
var2 1 0.2
var2 1 0.4
var2 1 0.3
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 3 0.7
var2 3 0.1
var2 3 0.05
var2 3 0.05
Hopefully, that's clear but please let me know if more information is needed.
Upvotes: 0
Views: 155
Reputation: 873
I've broken it out in a few steps to help you see the logic and have used both data steps and SQL. Basically you want to count how many are in each group and keep all counts (the count within the group and the total count) around so you can use them to make your final logic.
data test;
length GROUP1 $5 GROUP2 PERCENT 8;
input GROUP1 $ GROUP2 PERCENT;
datalines;
var1 1 0.25
var1 1 0.75
var1 1 1
var1 2 0.4
var1 2 0.6
var1 2 1
var1 3 0.7
var1 3 0.3
var1 3 0.6
var2 1 0.1
var2 1 0.2
var2 1 0.4
var2 1 0.3
var2 1 1
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 1
var2 3 0.7
var2 3 0.1
var2 3 0.05
var2 3 0.05
var2 3 0.1
;
run;
** count the number of obs per group **;
data test_ct; set test;
by GROUP1 GROUP2;
COUNT + 1;
if first.GROUP2 then COUNT = 1;
run;
** count the total number of obs per group and output on each row **;
proc sql noprint;
create table test_ct_all as
select *, count(*) as COUNT_TOTAL
from test_ct group by GROUP1,GROUP2
order by GROUP1, GROUP2, COUNT;
quit;
** logic to keep records **;
data keep_flags; set test_ct_all;
if COUNT=1 and COUNT_TOTAL=3 then KEEP=1;
*the last record will have COUNT and COUNT_TOTAL equal;
if COUNT_TOTAL > 3 and (COUNT_TOTAL ne COUNT) then KEEP=1;
run;
** output only the keep records **;
data keepers; set keep_flags;
if KEEP=1;
run;
Upvotes: 1