sonarclick
sonarclick

Reputation: 73

How to select rows from a dataset with specific criteria applied to each subset in SAS Enterprise Guide

I have the following table:

COMPANY_NAME | GROUP | COUNTRY | STATUS  
COM1         |   1   |    DE   | DELETED   
COM2         |   1   |    DE   | REMAINING  
COM3         |   1   |    UK   | DELETED  
COM4         |   2   |    ES   | DELETED  
COM5         |   2   |    FR   | DELETED  
COM6         |   3   |    RO   | DELETED  
COM7         |   3   |    BG   | DELETED  
COM8         |   3   |    ES   | REMAINING  
COM9         |   3   |    ES   | DELETED 

I need to get:

COMPANY_NAME | GROUP | COUNTRY | STATUS  
COM3         |   1   |    UK   | DELETED  
COM4         |   2   |    ES   | DELETED  
COM5         |   2   |    FR   | DELETED  
COM6         |   3   |    RO   | DELETED  
COM7         |   3   |    BG   | DELETED

So, I need all entries where the status is DELETED and within each GROUP there is no COMPANY_NAME that has a status of REMAINING for the same country as the DELETED status. I can use a PROC SQL or a DATA step.

What I have tried so far is:

PROC SQL;
CREATE TABLE WORK.OUTPUT AS
SELECT *
FROM WORK.INPUT
WHERE STATUS = 'DELETED' AND COUNTRY NOT IN (SELECT COUNTRY FROM WORK.INPUT WHERE STATUS = 'REMAINING');
QUIT;

but this obviously excludes all REMAINING countries from all GROUPs.

I also tried a data step:

DATA WORK.OUTPUT;
SET WORK.INPUT;
BY GROUP COUNTRY;

IF NOT (STATUS = 'DELETED' AND COUNTRY NOT IN (COUNTRY WHERE STATUS = 'REMAINING')) THEN DELETE; 

RUN;

but the syntax is incorrect because I don't know the proper way of writing this.

Upvotes: 0

Views: 1300

Answers (2)

Amir
Amir

Reputation: 1000

Your solutions show you are thinking along the right lines.

One data step solution is:

data want(drop = remain_list);
   length remain_list $ 20;

   do until(last.group);
      set have;
      by group;

      if status = 'REMAINING' and not find(remain_list, country) then
         remain_list = catx(' ', remain_list, country);
   end;

   do until(last.group);
      set have;
      by group;

      if status = 'DELETED' and not find(remain_list, strip(country)) then
         output;
   end;
run;

Upvotes: 1

G.Arima
G.Arima

Reputation: 1171

Try this out:

proc sql;
select * from your_table
where status = 'deleted' and 
      catx("_",country,group) not in 
         (select catx("_",country,group) from your_table where status='remaining');
quit;  

Output:

company_name | group | country | status
com3         |   1   |    UK   | deleted
com4         |   2   |    ES   | deleted
com5         |   2   |    FR   | deleted
com6         |   3   |    RO   | deleted
com7         |   3   |    BG   | deleted

Upvotes: 3

Related Questions