user3934760
user3934760

Reputation: 95

Remove an observation (record) if it repeats in the variable (column)

This is my data:

enter image description here

I like to write a code that removes the second observation (highlighted) IF Var1 repeats AND Var3 is 1. So the my resulting table would look like this:

enter image description here

Also, these repetitions might pop up in any order. For example, the Var3 can be "1" in the begining, middle, as well as the end of a repetition series.

Thank you

Upvotes: 0

Views: 79

Answers (3)

Dominic Comtois
Dominic Comtois

Reputation: 10401

Making sure you sort your data properly, you could do this (not tested):

data want;
  set have;
  sortLast = (Var3 = 1);
run;

proc sort data=want;
  by Var1 sortLast;
run

data want;
  set want;
  if _N_ > 1 then do;
    Var1_lag = lag(Var1);
    if Var1 = Var1_lag and Var3 = 1 then delete;
  end;
  drop sortLast Var1_lag;
run;

Upvotes: 1

Richard
Richard

Reputation: 27498

Presuming the data is sorted by var1 already, the var3=1 rows can be simply removed

data want;
  set have;
  by var1;
  if var3=1 
     and not (first.var1 and last.var1) /* test for group with more than one row */
  then
    delete;
run;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269443

If I understand correctly, you want to keep "1" values only when they are the only value for a give var1. If that is correct, one say to do this in SQL is:

select t.*
from t
where var3 <> 1 or
      not exists (select 1 from t t2 where t2.var1 = t.var1 and t2.var3 <> 1);

Upvotes: 1

Related Questions