Cardinal
Cardinal

Reputation: 228

Removing lines based on conditions

I want to remove rows (or observations) in SAS based on certain conditions: If the i-th name in the name column is the same as (i+1)-th name AND the i-th date is the same as the (i+1)-th date, remove line (i+1)-th. I also want to delete rows i+2, i+3, etc. if the entries in these columns are the same as in row i. But, this could maybe done iteratively if the previous rows were deleted.

I am fairly new to SAS. I know I should use some form of loop but I can't figure out how to use it. This question has been bugging me a couple of days. I hope someone could help me out.

Upvotes: 1

Views: 153

Answers (2)

Richard
Richard

Reputation: 27508

This is a processing task that can be solved using a BY statement and automatic FIRST. by-group variable flag.

data want;
  set have;
  by date name;
  if first.name;
run;

If the by-group rows are contiguous but not sorted you can add the NOTSORTED option to the BY statement.

You will need to use additional variables in the by-group statement if the actual data structure has additional 'parent' categorical variables that further group name:dates (such as, say insuranceCompany and policyGroup, or team).

Upvotes: 3

Llex
Llex

Reputation: 1770

Test data:

+-----------+--------+-------+
| date      | name   | value |
+-----------+--------+-------+
| 10MAY2019 | param1 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param1 | 2     |
+-----------+--------+-------+
| 10MAY2019 | param1 | 2     |
+-----------+--------+-------+
| 10MAY2019 | param2 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param3 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param4 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param5 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param5 | 1     |
+-----------+--------+-------+

The idea of my solution is to generate key by concatenating params and retain it.

data test;
format date date9.;
input name $6. value date yymmdd10.;
datalines4;
param1   1 2019-05-10
param1   2 2019-05-10
param1   2 2019-05-10
param2   1 2019-05-10
param3   1 2019-05-10
param4   1 2019-05-10
param5   1 2019-05-10
param5   1 2019-05-10
;;;;
run;

data want(drop=key);
   set test;
   retain key;
   if _N_ = 1 then do;
      key = strip(name) || strip(put(date,date9.));
      output;
   end;
   else if key ne strip(name) || strip(put(date,date9.)) then do;
      key = strip(name) || strip(put(date,date9.));
      output;
   end;
run;

Result:

+-----------+--------+-------+
| date      | name   | value |
+-----------+--------+-------+
| 10MAY2019 | param1 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param2 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param3 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param4 | 1     |
+-----------+--------+-------+
| 10MAY2019 | param5 | 1     |
+-----------+--------+-------+

Upvotes: 1

Related Questions