Reputation: 423
I have a table in SAS dataset that looks like this:
proc sql;
create table my_table
(id char(1),
my_date num format=date9.,
my_col num);
insert into my_table
values('A','01JAN2010'd,.)
values('A','02JAN2010'd,0)
values('A','03DEC2009'd,1)
values('A','04NOV2009'd,1)
values('B','01JAN2010'd,.)
values('B','02NOV2009'd,2)
values('C','01JAN2010'd,.)
values('C','02OCT2009'd,3)
values('D','01JAN2010'd,.)
values('D','02NOV2009'd,2)
values('D','03OCT2009'd,1)
values('D','04AUG2009'd,2)
values('D','05MAY2009'd,3)
values('D','06APR2009'd,1);
quit;
I am trying to create a new column desired
that, for each group of id
column, flags the row with a value of 1 if the value in my_col
is missing or less than 3.
The part I'm having trouble with is that when there is a my_col
value that is greater than 2, I need the desired
value for that row to be missing and also stop flagging any remaining rows in the id
group with a value of 1.
The resulting dataset should look like this:
+----+-----------+--------+---------+
| id | my_date | my_col | desired |
+----+-----------+--------+---------+
| A | 01JAN2010 | . | 1 |
| A | 02JAN2010 | 0 | 1 |
| A | 03DEC2009 | 1 | 1 |
| A | 04NOV2009 | 1 | 1 |
| B | 01JAN2009 | . | 1 |
| B | 02NOV2009 | 2 | 1 |
| C | 01JAN2010 | . | 1 |
| C | 02OCT2009 | 3 | . |
| D | 01JAN2010 | . | 1 |
| D | 02NOV2009 | 2 | 1 |
| D | 03OCT2009 | 1 | 1 |
| D | 04AUG2009 | 2 | 1 |
| D | 05MAY2009 | 3 | . |
| D | 06APR2009 | 1 | . |
+----+-----------+--------+---------+
Upvotes: 1
Views: 269
Reputation: 51566
Looks like a simple application of a retained variable. Set the flag to 1 when you start a new group and then set it to missing when the value of MY_COL is larger than 2.
data want;
set my_table ;
by id;
if first.id then desired=1;
if my_col>2 then desired=.;
retain desired;
run;
Also it is not clear why you used such complicated code to create your example data. Why not a simple data step?
data my_table;
input id :$1. my_date :date. my_col;
format my_date date9.;
cards;
A 01JAN2010 .
A 02JAN2010 0
A 03DEC2009 1
A 04NOV2009 1
B 01JAN2010 .
B 02NOV2009 2
C 01JAN2010 .
C 02OCT2009 3
D 01JAN2010 .
D 02NOV2009 2
D 03OCT2009 1
D 04AUG2009 2
D 05MAY2009 3
D 06APR2009 1
;
Upvotes: 4
Reputation: 121
I can't think of a simpler way to do it, but this works. You will need to have your data sorted by id.
data my_table2;
set my_table;
by id;
format gt2flag $1.;
retain gt2flag;
if first.id then gt2flag='';
if my_col gt 2 then gt2flag='Y';
if gt2flag = 'Y' then desired=.;
else desired=1;
drop gt2flag;
run;
id my_date my_col desired
A 01JAN2010 . 1
A 02JAN2010 0 1
A 03DEC2009 1 1
A 04NOV2009 1 1
B 01JAN2010 . 1
B 02NOV2009 2 1
C 01JAN2010 . 1
C 02OCT2009 3 .
D 01JAN2010 . 1
D 02NOV2009 2 1
D 03OCT2009 1 1
D 04AUG2009 2 1
D 05MAY2009 3 .
D 06APR2009 1 .
Upvotes: 1