newRuser
newRuser

Reputation: 95

Delete group when its observation does not contain certain values in SAS

Refer to below table, an ID is considered as complete if at least one of its group having Day 1 to Day 3 (Duplicate allowed). I need to remove ID which has Group not having full Day 1 to Day 3.

ID   Group     Day
1     A        1
1     A        1
1     A        2
1     A        3
1     B        1
1     B        3
2     A        1
2     A        3
2     B        2

Expected result

ID   Group     Day
1     A        1
1     A        1
1     A        2
1     A        3
1     B        1
1     B        3

With this reference, Delete the group that none of its observation contain the certain value in SAS I have tried below code but it cannot remove ID 2.

PROC SQL;
CREATE TABLE TEMP AS SELECT
* FROM HAVE
GROUP BY ID
HAVING MIN(DAY)=1 AND MAX(DAY)=3
;QUIT;

PROC SQL;
CREATE TABLE TEMP1 AS SELECT
* FROM TEMP WHERE ID IN
(SELECT ID FROM TEMP
WHERE DAY=2)
;QUIT;

Upvotes: 0

Views: 516

Answers (2)

Tom
Tom

Reputation: 51566

So you want to find the set of ID values where the ID has at least one GROUP that has all three DAY values? Find the list of IDs as a subquery and use it to subset the original data.

The key thing in subquery is you want there to be 3 distinct values of DAY. If your data could have other values of DAY (like missing or 4) then use a WHERE clause to only keep the values you want to count.

proc sql;
create table want as
  select * from have
  where id in 
   (select id from have 
    where day in (1,2,3)
    group by id,group
    having count(distinct day)=3
   )
;
quit;

Upvotes: 1

Stu Sztukowski
Stu Sztukowski

Reputation: 12849

You can query the dataset with a removal list. For example:

proc sql noprint;
    create table want as
        select *
        from have
        where cats(group, id) NOT IN(select cats(group, id) from removal_list)
    ;
quit;

Creating the Removal List

This method will prevent you from having to do a Cartesian product on all IDs, groups, and days to create your removal list.

Assume that your data is sorted by ID, group, and day.

  1. For each ID, the first day in the group must be 1
  2. For each ID, all days in the group after the first day must have a difference of 1 from the previous day

Code:

data removal_list;
    set have;
    by ID Group Day;
    retain flag_remove_group;

    lag_day = lag(day);

    /* Reset flag_remove_group at the start of each (ID, Group).
       Check if the first day is > 1. If it is, set the removal flag.
    */
    if(first.group) then do;
        call missing(lag_day);

        if(day > 1) then flag_remove_group = 1;
            else flag_remove_group = 0;
    end;

    /* If it's not the first (ID, Group), check if days 
       are skipped between observations 
    */
    if(NOT first.group AND (day - lag_day) NE 1) then flag_remove_group = 1;

    if(flag_remove_group) then output;

    keep id group;
run;

Upvotes: 0

Related Questions