hope288
hope288

Reputation: 1445

SAS: Grouping by ID and summing the number of a condition in a variable for the ID

I have a dataset that contains the ID and a variable called CC. The CC holds multiple numbered values where each value represents something. It looks like this:

enter image description here

An ID can have the same CC in multiple rows, I just want to flag if the CC exists or not so even if Joe had five rows stating that he has CC equal to 3 I just want a 1 or 0 stating if Joe ever had a CC equal to 3.

I want it to look like this:

enter image description here

I tried coding it as shown below but the issue is that although I know an ID can have more than one type of CC the final dataset that's created from the code only shows 1 CC for each ID that is filled. I think maybe it's overwriting it? Also I should note that prior to this code I created the CC Flag variables and filled it all as zeros.

proc sql;
    DROP TABLE Flagged_CCs;
    CREATE TABLE Flagged_CCs AS
    select 
    ID,
    COUNT(ID) as count_ID,
    case when CC=1 then 1 end as CC_1,
    case when CC=2 then 1 end as CC_2,
    case when CC=3 then 1 end as CC_3 
  from Original_Dataset 
   group by ID;
quit;

Any help is appreciated, thank you.

Upvotes: 1

Views: 2040

Answers (2)

Reeza
Reeza

Reputation: 21294

If you're looking for a report here's one method, using PROC TABULATE.

proc format ;
value indicator_fmt
low - 0, . = 0
0 - high = 1;
run;

proc tabulate data=have;
class id cc;
table id , cc*N=''*f=indicator_fmt.;
run;

Your output will look like this then:

enter image description here

If you want a fully dynamic approach in a table where you don't need to know anything ahead of time, such as the number of CC's this is a different approach. It's a bit longer but the dynamic part makes it possibly worthwhile to implement.

enter image description here

Upvotes: 1

Wired604
Wired604

Reputation: 370

Is your issue the fact that after running your new code you still get multiple line per ID?

If so I propose this:

 proc sql;
      DROP TABLE Flagged_CCs;
      CREATE TABLE Flagged_CCs AS
      select ID
           ,case when CC_1 >0 then 1 else 0 end as CC_1
           ,case when CC_2 >0 then 1 else 0 end as CC_2
           ,case when CC_3 >0 then 1 else 0 end as CC_3
          from (
               select 
               ID,
               COUNT(ID) as count_ID,
               sum(case when CC=1 then 1 end) as CC_1,
               sum(case when CC=2 then 1 end) as CC_2,
               sum(case when CC=3 then 1 end) as CC_3 
               from Original_Dataset 
               group by ID 
           );
        quit;

The reason you are having the issue is that you are only aggregating the count of ID and not the other values, using an aggregate on them will eliminate duplicate records.

Hope this helps

Upvotes: 1

Related Questions