Reputation: 95
Refer to below table, an ID is considered as complete if at least one of its group having Day 1 to Day 3 (Duplicate allowed). I need to remove ID which has Group not having full Day 1 to Day 3.
ID Group Day
1 A 1
1 A 1
1 A 2
1 A 3
1 B 1
1 B 3
2 A 1
2 A 3
2 B 2
Expected result
ID Group Day
1 A 1
1 A 1
1 A 2
1 A 3
1 B 1
1 B 3
With this reference, Delete the group that none of its observation contain the certain value in SAS I have tried below code but it cannot remove ID 2.
PROC SQL;
CREATE TABLE TEMP AS SELECT
* FROM HAVE
GROUP BY ID
HAVING MIN(DAY)=1 AND MAX(DAY)=3
;QUIT;
PROC SQL;
CREATE TABLE TEMP1 AS SELECT
* FROM TEMP WHERE ID IN
(SELECT ID FROM TEMP
WHERE DAY=2)
;QUIT;
Upvotes: 0
Views: 516
Reputation: 51566
So you want to find the set of ID values where the ID has at least one GROUP that has all three DAY values? Find the list of IDs as a subquery and use it to subset the original data.
The key thing in subquery is you want there to be 3 distinct values of DAY. If your data could have other values of DAY (like missing or 4) then use a WHERE clause to only keep the values you want to count.
proc sql;
create table want as
select * from have
where id in
(select id from have
where day in (1,2,3)
group by id,group
having count(distinct day)=3
)
;
quit;
Upvotes: 1
Reputation: 12849
You can query the dataset with a removal list. For example:
proc sql noprint;
create table want as
select *
from have
where cats(group, id) NOT IN(select cats(group, id) from removal_list)
;
quit;
Creating the Removal List
This method will prevent you from having to do a Cartesian product on all IDs, groups, and days to create your removal list.
Assume that your data is sorted by ID
, group
, and day
.
Code:
data removal_list;
set have;
by ID Group Day;
retain flag_remove_group;
lag_day = lag(day);
/* Reset flag_remove_group at the start of each (ID, Group).
Check if the first day is > 1. If it is, set the removal flag.
*/
if(first.group) then do;
call missing(lag_day);
if(day > 1) then flag_remove_group = 1;
else flag_remove_group = 0;
end;
/* If it's not the first (ID, Group), check if days
are skipped between observations
*/
if(NOT first.group AND (day - lag_day) NE 1) then flag_remove_group = 1;
if(flag_remove_group) then output;
keep id group;
run;
Upvotes: 0