LonelySoul
LonelySoul

Reputation: 1222

Remove observation with only one variable different

This might be easy to do but I am finding it difficult to do in SAS.

I want to remove an observation with all variable same but one. I just want to keep one observation than two.

Example

DATA auto ;
  INPUT make $  mpg $ rep78 $ weight $ foreign $;
CARDS ;
AMC     22 3 2930 0
AMC     22 3 2930 1
AMC     22 3 2930 0
AMC     22 3 2930 1
Audi    23 5 2830 1
Audi    23 3 2070 1
;
RUN ;

In the AMC we have 3 entries with last two matching except in foreign where its 0 and 1. Now I want to keep only one of them.

Example is taken from the following website

With the updated example I need to clean AMC while keeping only one 1 and one 0. Currently it has 4 entries.

http://www.ats.ucla.edu/stat/sas/modules/sort.htm

Upvotes: 2

Views: 177

Answers (3)

scott
scott

Reputation: 2265

From your updated example, I think what you want is all the distinct rows in your dataset. This can be done using either a data step or proc sql:

PROC SORT DATA=auto OUT=auto2 NODUPLICATES ;
  BY _all_ ;
RUN ;

NODUPLICATES removes any duplicate rows, _all_ specifies for it to check all fields in the dataset.

proc sql noprint;
    create table auto2 as
    select distinct *
    from auto
    ;
quit;

* is a wildcard to specify all columns in the dataset and distinct removes duplicate rows

Upvotes: 0

Yick Leung
Yick Leung

Reputation: 1078

I assume that your data set has been sorted using proc sort such that the observations that have all variables the same except one are right next to each other like in your example.

 data work.merged;
    merge work.auto
    work.auto(rename=(make=make2 mpg=mpg2 rep78=rep2 weight=weight2
    foreign=foreign2) firstobs=2);
    x = 0;

    if (make=make2) then x + 1;
    if (mpg=mpg2) then x + 1;
    if (rep78=rep2) then x + 1;
    if (weight=weight2) then x + 1;
    if (foreign=foreign2) then x + 1;
    if x = 4 then delete;
run;

Output:

    Obs   make   mpg   rep78   weight   foreign   make2   mpg2   rep2   weight2   foreign2   x

     1    AMC    22      3      2930       0      AMC      22     3      2930        0       5
     2    AMC    22      3      2930       1      Audi     23     5      2830        1       1
     3    Audi   23      5      2830       1      Audi     23     3      2070        1       3
     4    Audi   23      3      2070       1                                                 0

We will compare the merged variables and if they are the same we will increment x by 1. In this case, we can increment variable x from 0 up to a maximum of 5 every time. Since there are 5 variables in this data set, x = 5 is the maximum and x = 0 is minimum. If all of them are the same except one, then x = 4. In the case of x = 4, we can delete it.

Upvotes: 1

Yick Leung
Yick Leung

Reputation: 1078

If you just want to keep one of them that are next to each other, we can merge the data set with itself with no by statement starting with the 2nd observation. It is likely that there are other ways of doing it (e.g. proc sql), instead of doing it by brute force from the data step.

data work.merged;
    merge work.auto
    work.auto(rename=(make=make2 mpg=mpg2 rep78=rep2 weight=weight2
    foreign=foreign2) firstobs=2);

    if make=make2 and mpg=mpg2 and rep78=rep2 and weight=weight2
    and foreign NE foreign2 then delete;

    if make=make2 and mpg=mpg2 and rep78=rep2 and weight NE weight2
    and foreign=foreign2 then delete;

    if make=make2 and mpg=mpg2 and rep78 NE rep2 and weight=weight2
    and foreign=foreign2 then delete;

    if make=make2 and mpg NE mpg2 and rep78=rep2 and weight=weight2
    and foreign=foreign2 then delete;

    if make NE make2 and mpg=mpg2 and rep78=rep2 and weight=weight2
    and foreign=foreign2 then delete;

run; 

proc print data=work.merged;
    var make mpg rep78 weight foreign;
run;

In the end we can compare the merged variables using conditional statements to determine which ones have all variables the same except one.

Output of proc print w/o var statement:

 Obs    make    mpg    rep78    weight    foreign    make2    mpg2    rep2    weight2    foreign2

  1     AMC     22       3       2930        0       AMC       22      3       2930         0
  2     AMC     22       3       2930        1       Audi      23      5       2830         1
  3     Audi    23       5       2830        1       Audi      23      3       2070         1
  4     Audi    23       3       2070        1

Upvotes: 1

Related Questions