Crubal Chenxi Li
Crubal Chenxi Li

Reputation: 313

Delete part of the data in SAS

I have a dataset called a, I want to delete the rows that column with the name "AVG_OPT_PR" = '.' And the following code works:

DATA work.a1;
  SET work.a;
    IF AVG_OPT_PR = '.' THEN delete;
RUN;

However, there is another column called "Period", and I just want to delete the rows that within one Location, all "AVG_OPT_PR" = '.'. How can I do that?

For example, data looks like:

Location_id |     Period  | AVG_OPT_PR 
------------+-------------+-----------
   L_1      | 2 weeks     |  105 
   L_1      | 1 month     |   .
   ...      |    ...      |  ... 
   L_1      | 1  year     |  103 
   L_2      | 2 weeks     |   . 
   L_2      | 1 month     |   . 
   ...      |    ...      |  ... 
   L_2      | 1 year      |   .   

Hence, rows that location id equals "L_2" are deleted, but all data for "L_1" are kept.

Thank you!

Upvotes: 0

Views: 148

Answers (2)

Etheur
Etheur

Reputation: 347

If it's as simple as wanting to only delete rows with both an AVG_OPT_PR of . and a Location_id of L-2, then this should work just fine.

DATA work.a1;
  SET work.a;
    IF AVG_OPT_PR = '.' and Location_id = 'L_2' THEN delete;
RUN;

And some additional examples to expand on how you could handle this with the IF THEN DELETE

To delete all rows with AVG_OPT_PR of . and where its Location_id does not equal L_1 (so everything but L_1)

DATA work.a1;
  SET work.a;
    IF AVG_OPT_PR = '.' and Location_id ^= 'L_1' THEN delete;
RUN;

And to delete all rows with AVG_OPT_PR of . and where its Location_id is one of those listed IN ('L_2','L_3','L_4',...)

DATA work.a1;
  SET work.a;
    IF AVG_OPT_PR = '.' and Location_id IN ('L_2','L_3','L_4') THEN delete;
RUN;

Upvotes: 0

Sean
Sean

Reputation: 1120

Select the maximum value of avg_opt_pr for each location_id (if the max is missing, it will come out as .), merge it back onto the first dataset, then drop all the location_ids that have a max of missing.

proc sql;
    create table a2 as select distinct
        location_id, max(avg_opt_pr) as avg_opt_pr_max
        from a1
        group by location_id;
quit;

proc sql;
    create table a3 as select
        a.*, b.avg_opt_pr_max
        from a1 as a
        left join a2 as b
        on a.location_id = b.location_id;
    delete from a3 where missing(avg_opt_pr_max);
quit;

Upvotes: 1

Related Questions