Reputation: 614
Lets say we have the following data set
obs check result
---------------------
aa 0 0
bb 0 0
cc 1 0
dd 0 0
ee 1 1
ff 1 1 <- we are here. Due to prev. value both eq 1 the "result" variable for both of them should also be 1
What I need is to check if 2 or more cases in "check" variable in a row are equal to 1 and if so, "result" variable for both of them should also be 1. The problem is that the only way I see to set "result" to 1 for both of them is to somehow go back to the previous "observed row" and change "result" variable there to 1 also.
Upvotes: 2
Views: 970
Reputation: 63424
Yick's answer is a very good general approach, but this specific problem has an interesting, simpler answer.
There is one way in SAS of looking ahead without having to read the data a second time in some fashion: The LAST.<variable>
automatic variables created by using a BY
statement. These involve an automatic lookahead that SAS does without forcing you to do it by hand. Thanks to NOTSORTED
, this lets you see if the next value is precisely equal to the current value for a variable.
Using Yick's dataset:
data work.sample;
input obs $ check result;
retain x;
output;
x = check;
datalines;
aa 0 0
bb 0 0
cc 1 0
dd 0 0
ee 1 0
ff 1 0
;
run;
data want;
set sample;
by check notsorted;
if not (first.check and last.check) and (check=1) then result=1;
*if it is not the first record AND the last record with the same check value in a row,
then it is one of a sequence of two or more and if check=1 then result should be 1;
put _all_;
run;
Upvotes: 2
Reputation: 1078
Because of the way the Data step loop is designed, I don't think we can go back to a previous observation that easily. However, we can use a retain
statement and merge
ing the data set with itself without
a by
statement to access previous or next observations:
data work.sample;
input obs $ check result;
retain x;
output;
x = check;
datalines;
aa 0 0
bb 0 0
cc 1 0
dd 0 0
ee 1 0
ff 1 0
;
run;
data work.sample2;
merge work.sample
work.sample(firstobs = 2 keep = check
rename = (check = c));
if (check = 1 and x = 1) or (check = 1 and c = 1) then result = 1;
run;
OUTPUT would look like this:
Obs obs check result x c
1 aa 0 0 . 0
2 bb 0 0 0 1
3 cc 1 0 0 0
4 dd 0 0 1 1
5 ee 1 1 0 1
6 ff 1 1 1 .
So, variable x represents the values that are retained using the retain
statement and output
statement. variable c represents the variable that we created when we merged the data set with itself beginning when observation = 2.
Lastly, we follow this up with a conditional statement to put the new values into result variable if there exists a consecutive occurrence of check = 1.
Upvotes: 3