user6610400
user6610400

Reputation:

SAS: Determine if value is in list of values grouped by two variables

I am looking for something that will group a dataset by ID1 and ID2, then go have a look if Start2 is present within Start1 for each group. I have tried using proc sql, but I can not get the grouping to work properly. See example from below:

ID1 |  ID2  |    Start1     |    Start2    |
 1  |   1   |  01-05-2015   |  01-10-2015  | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output. 
 1  |   1   |  01-10-2015   |  01-05-2015  | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.  
 1  |   2   |  01-03-2017   |  01-09-2017  | Should output
 1  |   2   |  01-03-2017   |  01-06-2017  | Should output
 1  |   2   |  01-03-2017   |  01-03-2017  | 
 2  |   1   |  01-05-2015   |  01-10-2016  |
 2  |   1   |  01-05-2015   |  01-05-2015  |
 2  |   1   |  01-05-2015   |  01-07-2015  | Edited: Should output
 2  |   1   |  01-10-2016   |  01-10-2016  |
 2  |   1   |  01-10-2016   |  01-05-2015  |
 2  |   1   |  01-10-2016   |  01-07-2015  | Edited: Should output

So an output looking like: 

ID1 |  ID2  |    Start1     |    Start2    |
 1  |   2   |  01-03-2017   |  01-09-2017  |
 1  |   2   |  01-03-2017   |  01-06-2017  |
 2  |   1   |  01-05-2015   |  01-07-2015  |
 2  |   1   |  01-10-2016   |  01-07-2015  |

Any help would be very appreciated! My current attempt looks something like this (inspired by Egor below):

proc sql;
   create table want as 
   select *
   from have
   group by ID1, ID2
   having not Start2 in (
        select Start1
        from have
        group by ID1, ID2)
;

This seems to also give the desired output, however, when I test it on my fullsize dataset, I can see that this is missing certain rows that should be outputted.

Sample data:

data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
 1  |   1   |  01-05-2015   |  01-10-2015  
 1  |   1   |  01-10-2015   |  01-05-2015  
 1  |   2   |  01-03-2017   |  01-09-2017  
 1  |   2   |  01-03-2017   |  01-06-2017  
 1  |   2   |  01-03-2017   |  01-03-2017  
 2  |   1   |  01-05-2015   |  01-10-2016  
 2  |   1   |  01-05-2015   |  01-05-2015  
 2  |   1   |  01-05-2015   |  01-07-2015  
 2  |   1   |  01-10-2016   |  01-10-2016  
 2  |   1   |  01-10-2016   |  01-05-2015  
 2  |   1   |  01-10-2016   |  01-07-2015  
 ;
 run;

Upvotes: 0

Views: 309

Answers (1)

Egor Lipchinskiy
Egor Lipchinskiy

Reputation: 269

Try 2

Input data:

data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
 1  |   1   |  01-05-2015   |  01-10-2015  
 1  |   1   |  01-10-2015   |  01-05-2015  
 1  |   2   |  01-03-2017   |  01-09-2017  
 1  |   2   |  01-03-2017   |  01-06-2017  
 1  |   2   |  01-03-2017   |  01-03-2017  
 2  |   1   |  01-05-2015   |  01-10-2016  
 2  |   1   |  01-05-2015   |  01-05-2015  
 2  |   1   |  01-05-2015   |  01-07-2015  
 2  |   1   |  01-10-2016   |  01-10-2016  
 2  |   1   |  01-10-2016   |  01-05-2015  
 2  |   1   |  01-10-2016   |  01-07-2015  
 ;
 run;

Solution:

proc sql;
   create table c as 
   select distinct id1, id2, Start1
   from a
;
 proc sql;
   create table b as 
   select id1, id2, Start1, Start2
   from a
   where Start2 not in (select Start1 from c where a.id1=c.id1 and a.id2=c.id2)
;

Result:

1   2   01/03/2017  01/09/2017
1   2   01/03/2017  01/06/2017
2   1   01/05/2015  01/07/2015
2   1   01/10/2016  01/07/2015

Upvotes: 0

Related Questions