Duna
Duna

Reputation: 735

How to create SAS data set extracting observations with unique keys

I have a sas data set consisting of more than 100 variables. Variables pid - a character variable and year - a numeric variable identify observations in the data set.

How can I create a new data set consisting of observations that have unique pid and year combination. That is, if a given pid and year combination occurs more than once, I want to delete all the associated observations, not just the duplicates.

Upvotes: 0

Views: 670

Answers (4)

Reeza
Reeza

Reputation: 21264

Use the UNIQUEOUT and NOUNIQUEKEY option in proc sort for a single step solution.

data class;
set sashelp.class;
run;

proc sort data=class nouniquekey uniqueout=unique_data;
by sex age;
run;

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p0qh2iuz3fa6rpn1eib1gaxr0sb5.htm

Upvotes: 2

Avinash
Avinash

Reputation: 176

I don't use much of data step. I use proc sql and is easy for me.

proc sql;
    create table new_dataset as
    select * from old_dataset as a
      join
    (select pid, year, count(1) from old_dataset group by pid, year having count(1)<2)
    as b on a.pid=b.pid and a.year=b.year;
run;

inner query only gets pid and year which occur once. Any multiple occurrence of pid and year are not taken into account because of having count(1)<2. I get those observations only from original by joining back on pid and year. This actually doesn't need sorting.

Let me know in case of any questions.

Upvotes: 3

Longfish
Longfish

Reputation: 7602

Simple use of first. and last. in a data step will do this. Run proc sort if the data is not already sorted by pid and year.

proc sort data=have;
by pid year;
run;

data want;
set have;
by pid year;
if first.year and last.year then output; /* only keep unique rows */
run;

Upvotes: 2

user667489
user667489

Reputation: 9569

You can generate a dataset containing combinations of pid and year that appear more than once, then merge it with the rest to remove matches:

proc sort data = have nodupkey dupout = duplicates;
    by pid year;
run;

data want;
    merge have 
          duplicates(in = a keep = pid year);
    by pid year;
    if not(a);
run;

Upvotes: 1

Related Questions