Reputation: 735
I have a sas
data set consisting of more than 100 variables. Variables pid
- a character variable and year
- a numeric variable identify observations in the data set.
How can I create a new data set consisting of observations that have unique pid
and year
combination. That is, if a given pid
and year
combination occurs more than once, I want to delete all the associated observations, not just the duplicates.
Upvotes: 0
Views: 670
Reputation: 21264
Use the UNIQUEOUT and NOUNIQUEKEY option in proc sort for a single step solution.
data class;
set sashelp.class;
run;
proc sort data=class nouniquekey uniqueout=unique_data;
by sex age;
run;
Upvotes: 2
Reputation: 176
I don't use much of data step. I use proc sql and is easy for me.
proc sql;
create table new_dataset as
select * from old_dataset as a
join
(select pid, year, count(1) from old_dataset group by pid, year having count(1)<2)
as b on a.pid=b.pid and a.year=b.year;
run;
inner query only gets pid and year which occur once. Any multiple occurrence of pid and year are not taken into account because of having count(1)<2
. I get those observations only from original by joining back on pid and year. This actually doesn't need sorting.
Let me know in case of any questions.
Upvotes: 3
Reputation: 7602
Simple use of first.
and last.
in a data step will do this. Run proc sort
if the data is not already sorted by pid and year.
proc sort data=have;
by pid year;
run;
data want;
set have;
by pid year;
if first.year and last.year then output; /* only keep unique rows */
run;
Upvotes: 2
Reputation: 9569
You can generate a dataset containing combinations of pid and year that appear more than once, then merge it with the rest to remove matches:
proc sort data = have nodupkey dupout = duplicates;
by pid year;
run;
data want;
merge have
duplicates(in = a keep = pid year);
by pid year;
if not(a);
run;
Upvotes: 1