Reputation: 1445
I have a dataset that contains the ID and a variable called CC. The CC holds multiple numbered values where each value represents something. It looks like this:
An ID can have the same CC in multiple rows, I just want to flag if the CC exists or not so even if Joe had five rows stating that he has CC equal to 3 I just want a 1 or 0 stating if Joe ever had a CC equal to 3.
I want it to look like this:
I tried coding it as shown below but the issue is that although I know an ID can have more than one type of CC the final dataset that's created from the code only shows 1 CC for each ID that is filled. I think maybe it's overwriting it? Also I should note that prior to this code I created the CC Flag variables and filled it all as zeros.
proc sql;
DROP TABLE Flagged_CCs;
CREATE TABLE Flagged_CCs AS
select
ID,
COUNT(ID) as count_ID,
case when CC=1 then 1 end as CC_1,
case when CC=2 then 1 end as CC_2,
case when CC=3 then 1 end as CC_3
from Original_Dataset
group by ID;
quit;
Any help is appreciated, thank you.
Upvotes: 1
Views: 2040
Reputation: 21294
If you're looking for a report here's one method, using PROC TABULATE.
proc format ;
value indicator_fmt
low - 0, . = 0
0 - high = 1;
run;
proc tabulate data=have;
class id cc;
table id , cc*N=''*f=indicator_fmt.;
run;
Your output will look like this then:
If you want a fully dynamic approach in a table where you don't need to know anything ahead of time, such as the number of CC's this is a different approach. It's a bit longer but the dynamic part makes it possibly worthwhile to implement.
Upvotes: 1
Reputation: 370
Is your issue the fact that after running your new code you still get multiple line per ID?
If so I propose this:
proc sql;
DROP TABLE Flagged_CCs;
CREATE TABLE Flagged_CCs AS
select ID
,case when CC_1 >0 then 1 else 0 end as CC_1
,case when CC_2 >0 then 1 else 0 end as CC_2
,case when CC_3 >0 then 1 else 0 end as CC_3
from (
select
ID,
COUNT(ID) as count_ID,
sum(case when CC=1 then 1 end) as CC_1,
sum(case when CC=2 then 1 end) as CC_2,
sum(case when CC=3 then 1 end) as CC_3
from Original_Dataset
group by ID
);
quit;
The reason you are having the issue is that you are only aggregating the count of ID and not the other values, using an aggregate on them will eliminate duplicate records.
Hope this helps
Upvotes: 1