Reputation: 593
For the following data I am trying to filter rows, of each group ID, based on these conditions:
type='B' and value='Y'
do the following
type='F' and value='Y'
.Can we create the flag variable as shown in my want dataset? so that I can filter on Flag='Y'?
Have
ID Type Date Value
001 F 1/2/2018 Y
001 B 1/3/2018
001 B 1/4/2018 Y
001 B 1/5/2018
001 B 1/6/2018
001 F 1/6/2018 Y
001 B 1/6/2018
001 B 1/7/2018
001 B 1/8/2018 Y
001 B 1/8/2018
001 B 1/9/2018
002 F 1/2/2018 Y
002 B 1/3/2018
002 B 1/4/2018
Want
ID Type Date Value Flag
001 F 1/2/2018 Y Y
001 B 1/3/2018 Y
001 B 1/4/2018 Y Y
001 B 1/5/2018
001 B 1/6/2018
001 F 1/6/2018 Y Y
001 B 1/6/2018 Y
001 B 1/7/2018 Y
001 B 1/8/2018 Y Y
001 B 1/8/2018
001 B 1/9/2018
002 F 1/2/2018 Y Y
002 B 1/3/2018 Y
002 B 1/4/2018 Y
I tried to do the following
data F;
set have;
where Type='F';run;
data B;
set have;
where Type='B';run;
proc sql;
create table all as select
a.* from B as b
inner join F as f
on a.id=b.id
and b.date >= a.date;
quit;
This includes all the rows from my have dataset. Any help is much appreciated.
Upvotes: 3
Views: 1523
Reputation: 27526
The criteria for computing the state of a row as part of a contiguous sub-group (call it a 'run' of rows) within group ID
are relatively simple, but a compromised state might occur or be indicated if some funny cases of data occur:
data want(drop=run_:);
SET have;
BY id;
run_first = (type='F' and value='Y');
run_final = (type='B' and value='Y');
* set flag state at criteria for start of contiguous sub-group criteria;
run_flag + run_first;
if first.id and NOT run_flag then
put 'WARNING: first row in group ' id= ' is not F Y, this may be incorrect';
if run_flag > 1 and run_first then
put 'WARNING: an additional F Y before a B Y at row ' _n_;
if run_flag then
OUTPUT;
if run_flag = 0 and run_final then
put 'WARNING: an additional B Y before a F Y at row ' _n_;
* reset flag at criteria for contiguous sub-group;
if last.id or run_final then
run_flag = 0;
run;
Upvotes: 1
Reputation: 33
I have a solution but it is not the most elegant (and might not cover corner cases.) If anyone else has a better solution please share.
First, to create the dataset in-case anyone else want to try it out:
Data work.have;
input @01 ID 3.
@05 Type $1.
@07 Date date7.
@18 Value $1.;
format ID 3.
Type $1.
Date date11.
Value $1.;
datalines;
001 F '02Jan18'n Y
001 B '03Jan18'n
001 B '04Jan18'n Y
001 B '05Jan18'n
001 B '06Jan18'n
001 F '06Jan18'n Y
001 B '06Jan18'n
001 B '07Jan18'n
001 B '08Jan18'n Y
001 B '08Jan18'n
001 B '09Jan18'n
002 F '02Jan18'n Y
002 B '03Jan18'n
002 B '04Jan18'n
;
run;
Solution: I based on your edited suggestion of creating a flag variable.
Data Flag;
set work.have;
if Type = 'B' and Value = 'Y' then
flag + 1;
if Type = 'F' then
flag = 0;
if Value ne 'Y' and flag = 1 then delete;
run;
The flag variable is 0 by default.
The first IF-Then condition identifies the Type B ='Y' rows and flag them as 1, as well as retaining this flag for the subsequent rows.
The second IF-Then condition identifies the type='F' row and resets the Flag to 0
The Last If-Then condition drops all rows with Flag=1 except the first occurrence which are the Type B ='Y' rows.
I hope this applies to your problem.
Upvotes: 1
Reputation: 33
Same as Richard, I don't quite understand what the filtering criteria are.
I could see one problem with your join. you used a.* in your select statement, but "b" and "f" as your dataset aliases. this would not work as no dataset have been assigned to alias "a".
Proper way would be as follow:
proc sql;
create table all as
select b.* from B as b
inner join F as f
on b.id=f.id
and b.date >= f.date;
quit;
However, even then, I don't believe inner join is the proper way to solve your problem. Do let us your filtering condition please?
Upvotes: 1