Reputation: 35
I have the following two datasets. I want to delete the lines from A with dates that falls within the in-out period in file B.
File A
SEQ ID date var1
1 A12 03JAN04 242
2 A12 01FEB06 356
3 A12 06JAN08 325
4 A12 28DEC09 123
5 B34 06MAY03 985
6 B34 13JUN03 198
7 B34 10MAY05 241
8 C56 09NOV10 155
9 C56 19OCT13 352
File B
ID date_in date_out
A12 01JAN04 04JAN04
A12 05FEB08 08FEB08
B34 03MAY03 06MAY03
B34 09MAY05 19MAY05
C56 12JUL12 18JUL12
For example, for ID=A12, his first obs falls within 01JAN04-04JAN04, thus I delete that obs, but keep the rest.
Essentially, I should get the following results
SEQ ID date var1
2 A12 01FEB06 356
3 A12 06JAN08 325
4 A12 28DEC09 123
6 B34 13JUN03 198
8 C56 09NOV10 155
9 C56 19OCT13 352
Both file A and B has multiple records per ID, and the between in-out period in file B is not alway constant.
I think in R, I could probably write a loop through lines in A, subset out a list from B that matches the ID, then loop through that list of period from B see if date in A falls within the in-between period.
But I need to this in SAS, so I'm really clueless about where to start. How do I even loop through line-by-line in SAS? eg. should I write a macro that loops through both file some how? or use SQL somehow?
I would greatly appreciate some advice on how to approach this problem. Thanks!
Upvotes: 3
Views: 113
Reputation: 10411
PROC SQL easily takes care of that...
proc sql;
create table bad_seqs as
select A.SEQ
from A
left join B
on A.ID = B.ID
where A.date between B.date_in and B.date_out;
delete from A where SEQ in (select SEQ from bad_seqs);
quit;
Upvotes: 2