Reputation:
I am looking for something that will group a dataset by ID1 and ID2, then go have a look if Start2 is present within Start1 for each group. I have tried using proc sql, but I can not get the grouping to work properly. See example from below:
ID1 | ID2 | Start1 | Start2 |
1 | 1 | 01-05-2015 | 01-10-2015 | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.
1 | 1 | 01-10-2015 | 01-05-2015 | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.
1 | 2 | 01-03-2017 | 01-09-2017 | Should output
1 | 2 | 01-03-2017 | 01-06-2017 | Should output
1 | 2 | 01-03-2017 | 01-03-2017 |
2 | 1 | 01-05-2015 | 01-10-2016 |
2 | 1 | 01-05-2015 | 01-05-2015 |
2 | 1 | 01-05-2015 | 01-07-2015 | Edited: Should output
2 | 1 | 01-10-2016 | 01-10-2016 |
2 | 1 | 01-10-2016 | 01-05-2015 |
2 | 1 | 01-10-2016 | 01-07-2015 | Edited: Should output
So an output looking like:
ID1 | ID2 | Start1 | Start2 |
1 | 2 | 01-03-2017 | 01-09-2017 |
1 | 2 | 01-03-2017 | 01-06-2017 |
2 | 1 | 01-05-2015 | 01-07-2015 |
2 | 1 | 01-10-2016 | 01-07-2015 |
Any help would be very appreciated! My current attempt looks something like this (inspired by Egor below):
proc sql;
create table want as
select *
from have
group by ID1, ID2
having not Start2 in (
select Start1
from have
group by ID1, ID2)
;
This seems to also give the desired output, however, when I test it on my fullsize dataset, I can see that this is missing certain rows that should be outputted.
Sample data:
data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
1 | 1 | 01-05-2015 | 01-10-2015
1 | 1 | 01-10-2015 | 01-05-2015
1 | 2 | 01-03-2017 | 01-09-2017
1 | 2 | 01-03-2017 | 01-06-2017
1 | 2 | 01-03-2017 | 01-03-2017
2 | 1 | 01-05-2015 | 01-10-2016
2 | 1 | 01-05-2015 | 01-05-2015
2 | 1 | 01-05-2015 | 01-07-2015
2 | 1 | 01-10-2016 | 01-10-2016
2 | 1 | 01-10-2016 | 01-05-2015
2 | 1 | 01-10-2016 | 01-07-2015
;
run;
Upvotes: 0
Views: 309
Reputation: 269
Try 2
Input data:
data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
1 | 1 | 01-05-2015 | 01-10-2015
1 | 1 | 01-10-2015 | 01-05-2015
1 | 2 | 01-03-2017 | 01-09-2017
1 | 2 | 01-03-2017 | 01-06-2017
1 | 2 | 01-03-2017 | 01-03-2017
2 | 1 | 01-05-2015 | 01-10-2016
2 | 1 | 01-05-2015 | 01-05-2015
2 | 1 | 01-05-2015 | 01-07-2015
2 | 1 | 01-10-2016 | 01-10-2016
2 | 1 | 01-10-2016 | 01-05-2015
2 | 1 | 01-10-2016 | 01-07-2015
;
run;
Solution:
proc sql;
create table c as
select distinct id1, id2, Start1
from a
;
proc sql;
create table b as
select id1, id2, Start1, Start2
from a
where Start2 not in (select Start1 from c where a.id1=c.id1 and a.id2=c.id2)
;
Result:
1 2 01/03/2017 01/09/2017
1 2 01/03/2017 01/06/2017
2 1 01/05/2015 01/07/2015
2 1 01/10/2016 01/07/2015
Upvotes: 0