SAS: Grouping by ID and summing the number of a condition in a variable for the ID

Question

I have a dataset that contains the ID and a variable called CC. The CC holds multiple numbered values where each value represents something. It looks like this:

An ID can have the same CC in multiple rows, I just want to flag if the CC exists or not so even if Joe had five rows stating that he has CC equal to 3 I just want a 1 or 0 stating if Joe ever had a CC equal to 3.

I want it to look like this:

I tried coding it as shown below but the issue is that although I know an ID can have more than one type of CC the final dataset that's created from the code only shows 1 CC for each ID that is filled. I think maybe it's overwriting it? Also I should note that prior to this code I created the CC Flag variables and filled it all as zeros.

proc sql;
    DROP TABLE Flagged_CCs;
    CREATE TABLE Flagged_CCs AS
    select 
    ID,
    COUNT(ID) as count_ID,
    case when CC=1 then 1 end as CC_1,
    case when CC=2 then 1 end as CC_2,
    case when CC=3 then 1 end as CC_3 
  from Original_Dataset 
   group by ID;
quit;

Any help is appreciated, thank you.

Wired604 · Accepted Answer

Is your issue the fact that after running your new code you still get multiple line per ID?

If so I propose this:

 proc sql;
      DROP TABLE Flagged_CCs;
      CREATE TABLE Flagged_CCs AS
      select ID
           ,case when CC_1 >0 then 1 else 0 end as CC_1
           ,case when CC_2 >0 then 1 else 0 end as CC_2
           ,case when CC_3 >0 then 1 else 0 end as CC_3
          from (
               select 
               ID,
               COUNT(ID) as count_ID,
               sum(case when CC=1 then 1 end) as CC_1,
               sum(case when CC=2 then 1 end) as CC_2,
               sum(case when CC=3 then 1 end) as CC_3 
               from Original_Dataset 
               group by ID 
           );
        quit;

The reason you are having the issue is that you are only aggregating the count of ID and not the other values, using an aggregate on them will eliminate duplicate records.

Hope this helps

SAS: Grouping by ID and summing the number of a condition in a variable for the ID

Answers (2)

Related Questions