Reputation: 228
I want to remove rows (or observations) in SAS based on certain conditions: If the i-th name in the name column is the same as (i+1)-th name AND the i-th date is the same as the (i+1)-th date, remove line (i+1)-th. I also want to delete rows i+2, i+3, etc. if the entries in these columns are the same as in row i. But, this could maybe done iteratively if the previous rows were deleted.
I am fairly new to SAS. I know I should use some form of loop but I can't figure out how to use it. This question has been bugging me a couple of days. I hope someone could help me out.
Upvotes: 1
Views: 153
Reputation: 27508
This is a processing task that can be solved using a BY
statement and automatic FIRST.
by-group variable flag.
data want;
set have;
by date name;
if first.name;
run;
If the by-group rows are contiguous but not sorted you can add the NOTSORTED
option to the BY
statement.
You will need to use additional variables in the by-group statement if the actual data structure has additional 'parent' categorical variables that further group name:dates (such as, say insuranceCompany
and policyGroup
, or team
).
Upvotes: 3
Reputation: 1770
Test data:
+-----------+--------+-------+
| date | name | value |
+-----------+--------+-------+
| 10MAY2019 | param1 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param1 | 2 |
+-----------+--------+-------+
| 10MAY2019 | param1 | 2 |
+-----------+--------+-------+
| 10MAY2019 | param2 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param3 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param4 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param5 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param5 | 1 |
+-----------+--------+-------+
The idea of my solution is to generate key by concatenating params and retain it.
data test;
format date date9.;
input name $6. value date yymmdd10.;
datalines4;
param1 1 2019-05-10
param1 2 2019-05-10
param1 2 2019-05-10
param2 1 2019-05-10
param3 1 2019-05-10
param4 1 2019-05-10
param5 1 2019-05-10
param5 1 2019-05-10
;;;;
run;
data want(drop=key);
set test;
retain key;
if _N_ = 1 then do;
key = strip(name) || strip(put(date,date9.));
output;
end;
else if key ne strip(name) || strip(put(date,date9.)) then do;
key = strip(name) || strip(put(date,date9.));
output;
end;
run;
Result:
+-----------+--------+-------+
| date | name | value |
+-----------+--------+-------+
| 10MAY2019 | param1 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param2 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param3 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param4 | 1 |
+-----------+--------+-------+
| 10MAY2019 | param5 | 1 |
+-----------+--------+-------+
Upvotes: 1