Melodie Hallett
Melodie Hallett

Reputation: 1

Using SAS with multiple rows per ID to retain all records of that ID if any of them contain State=MI

I have a large dataset with columns as follows:

| ID |State |
| 1  |   MI |
| 1  |   CA |
| 2  |   CA |
| 2  |   CA |
| 3  |   WY |
| 3  |   WY |

I want to keep the following records in a new data set, if any ID contains CA in state, then keep all of their records otherwise, delete those rows.

RESULT

| ID |State |
| 1  |   MI |
| 1  |   CA |
| 2  |   CA |
| 2  |   CA |

How can I do this with SAS?

    libname sql 'D:/Ro/';
    proc sql;
       create table sql.idstate
              (state char(2),           /* 2–character column for               */
                                        /* state abbreviation                   */
                 ID num                 /* column for date of entry into the US */
                );

Created some sort of table, but it's empty - I think I need to add a data statement somewhere?

Upvotes: 0

Views: 78

Answers (2)

Richard
Richard

Reputation: 27508

You can do a reflexive inner join

proc sql;
  create table want as
  select have.*
  from 
  have join (select distinct id from have where state='CA') ca
  on have.id = ca.id
  ;

Upvotes: 0

Tom
Tom

Reputation: 51621

Assuming you meant that you have a dataset like this:

data have;
  input ID State :$2.;
cards;
1 MI
1 CA 
2 CA 
2 CA 
3 WY 
3 WY
;

Then you should be able to use a subquery with the IN operator in PROC SQL.

proc sql;
create table want as
  select * from have
  where id in (select id from have where state='CA')
;
quit;

If you are having performance issues with a large dataset and it is already sorted by ID then a data step might perform better. The data step will also have to pass through the dataset twice, but it can do it one ID at a time and so take advantage of disk caching the operating system is probably doing already.

data want;
  do until(last.id);
    set have;
    by id;
    if state='CA' then flag=1;
  end;
  do until(last.id);
    set have;
    by id;
    if flag=1 then output;
  end;
  drop flag;
run;

Upvotes: 0

Related Questions