Reputation: 1
I have a large dataset with columns as follows:
| ID |State |
| 1 | MI |
| 1 | CA |
| 2 | CA |
| 2 | CA |
| 3 | WY |
| 3 | WY |
I want to keep the following records in a new data set, if any ID contains CA in state, then keep all of their records otherwise, delete those rows.
RESULT
| ID |State |
| 1 | MI |
| 1 | CA |
| 2 | CA |
| 2 | CA |
How can I do this with SAS?
libname sql 'D:/Ro/';
proc sql;
create table sql.idstate
(state char(2), /* 2–character column for */
/* state abbreviation */
ID num /* column for date of entry into the US */
);
Created some sort of table, but it's empty - I think I need to add a data statement somewhere?
Upvotes: 0
Views: 78
Reputation: 27508
You can do a reflexive inner join
proc sql;
create table want as
select have.*
from
have join (select distinct id from have where state='CA') ca
on have.id = ca.id
;
Upvotes: 0
Reputation: 51621
Assuming you meant that you have a dataset like this:
data have;
input ID State :$2.;
cards;
1 MI
1 CA
2 CA
2 CA
3 WY
3 WY
;
Then you should be able to use a subquery with the IN operator in PROC SQL.
proc sql;
create table want as
select * from have
where id in (select id from have where state='CA')
;
quit;
If you are having performance issues with a large dataset and it is already sorted by ID then a data step might perform better. The data step will also have to pass through the dataset twice, but it can do it one ID at a time and so take advantage of disk caching the operating system is probably doing already.
data want;
do until(last.id);
set have;
by id;
if state='CA' then flag=1;
end;
do until(last.id);
set have;
by id;
if flag=1 then output;
end;
drop flag;
run;
Upvotes: 0