Reputation: 1222
This might be easy to do but I am finding it difficult to do in SAS.
I want to remove an observation with all variable same but one. I just want to keep one observation than two.
Example
DATA auto ;
INPUT make $ mpg $ rep78 $ weight $ foreign $;
CARDS ;
AMC 22 3 2930 0
AMC 22 3 2930 1
AMC 22 3 2930 0
AMC 22 3 2930 1
Audi 23 5 2830 1
Audi 23 3 2070 1
;
RUN ;
In the AMC we have 3 entries with last two matching except in foreign where its 0 and 1. Now I want to keep only one of them.
Example is taken from the following website
With the updated example I need to clean AMC while keeping only one 1 and one 0. Currently it has 4 entries.
http://www.ats.ucla.edu/stat/sas/modules/sort.htm
Upvotes: 2
Views: 177
Reputation: 2265
From your updated example, I think what you want is all the distinct rows in your dataset. This can be done using either a data step or proc sql:
PROC SORT DATA=auto OUT=auto2 NODUPLICATES ;
BY _all_ ;
RUN ;
NODUPLICATES
removes any duplicate rows, _all_
specifies for it to check all fields in the dataset.
proc sql noprint;
create table auto2 as
select distinct *
from auto
;
quit;
*
is a wildcard to specify all columns in the dataset and distinct removes duplicate rows
Upvotes: 0
Reputation: 1078
I assume that your data set has been sorted using proc sort
such that the observations that have all variables the same except one are right next to each other like in your example.
data work.merged;
merge work.auto
work.auto(rename=(make=make2 mpg=mpg2 rep78=rep2 weight=weight2
foreign=foreign2) firstobs=2);
x = 0;
if (make=make2) then x + 1;
if (mpg=mpg2) then x + 1;
if (rep78=rep2) then x + 1;
if (weight=weight2) then x + 1;
if (foreign=foreign2) then x + 1;
if x = 4 then delete;
run;
Output:
Obs make mpg rep78 weight foreign make2 mpg2 rep2 weight2 foreign2 x
1 AMC 22 3 2930 0 AMC 22 3 2930 0 5
2 AMC 22 3 2930 1 Audi 23 5 2830 1 1
3 Audi 23 5 2830 1 Audi 23 3 2070 1 3
4 Audi 23 3 2070 1 0
We will compare the merged
variables and if they are the same we will increment x
by 1. In this case, we can increment variable x
from 0 up to a maximum of 5 every time. Since there are 5 variables in this data set, x = 5 is the maximum and x = 0 is minimum. If all of them are the same except one, then x = 4. In the case of x = 4, we can delete it.
Upvotes: 1
Reputation: 1078
If you just want to keep one of them that are next to each other, we can merge
the data set with itself with no by
statement starting with the 2nd observation. It is likely that there are other ways of doing it (e.g. proc sql), instead of doing it by brute force from the data step.
data work.merged;
merge work.auto
work.auto(rename=(make=make2 mpg=mpg2 rep78=rep2 weight=weight2
foreign=foreign2) firstobs=2);
if make=make2 and mpg=mpg2 and rep78=rep2 and weight=weight2
and foreign NE foreign2 then delete;
if make=make2 and mpg=mpg2 and rep78=rep2 and weight NE weight2
and foreign=foreign2 then delete;
if make=make2 and mpg=mpg2 and rep78 NE rep2 and weight=weight2
and foreign=foreign2 then delete;
if make=make2 and mpg NE mpg2 and rep78=rep2 and weight=weight2
and foreign=foreign2 then delete;
if make NE make2 and mpg=mpg2 and rep78=rep2 and weight=weight2
and foreign=foreign2 then delete;
run;
proc print data=work.merged;
var make mpg rep78 weight foreign;
run;
In the end we can compare the merged
variables using conditional statements to determine which ones have all variables the same except one.
Output of proc print w/o var statement:
Obs make mpg rep78 weight foreign make2 mpg2 rep2 weight2 foreign2
1 AMC 22 3 2930 0 AMC 22 3 2930 0
2 AMC 22 3 2930 1 Audi 23 5 2830 1
3 Audi 23 5 2830 1 Audi 23 3 2070 1
4 Audi 23 3 2070 1
Upvotes: 1