Damien Dotta
Damien Dotta

Reputation: 939

How to delete some observations in some groups based on a condition

I would like delete some observations in some groups based on a condition. My condition is each group : if (imput="toto") then delete but if a whole group has imput="toto" then keep it

data temp;
input siren $ class $ imput $;
cards;
A CP titi
B CP toto
C CE tata
D CE tata
F CM toto
G CM toto
H SU tata
I SU toto
;
run;

The output I want :

siren class imput
A CP titi
C CE tata
D CE tata
F CM toto
G CM toto
H SU tata

Many thanks in advance !

Upvotes: 1

Views: 102

Answers (3)

In SQL server you can achieve this through below query:

with cte as(
select * ,count(*) over(partition by class,input order by class,input)classwiseinputwisecount,
count(*) over(partition by class order by class )classwisecount
from data)
select * from data where not exists (
select siren,class,input from cte where classwiseinputwisecount<>classwisecount and input='toto' and cte.siren=data.siren)

Simplified version:

Proc SQL;
    select * from data where not exists (
    select siren,class,input from (
    select class,input ,count(*) classwiseinputwisecount,(select count(*) from data d where d.class=data.class)classwisecount
    
    from data
    group by class,input)cte
    where classwiseinputwisecount<>classwisecount and input='toto' and cte.class=data.class and cte.input=data.input);
    Quit;

result: enter image description here

Upvotes: 1

Tom
Tom

Reputation: 51621

So you want to do two checks. Is the value toto and is the value only toto. Here is SQL that will do that. I had it explicitly create the checks as new variables so you can see what is happening. To see the check values for all observations remove the HAVING clause. If you are happy with it you can remove the check variables and just move the conditions into the having clause.

data temp;
  row+1;
  input siren $ class $ imput $;
cards;
A CP titi
B CP toto
C CE tata
D CE tata
F CM toto
G CM toto
H SU tata
I SU toto
;

proc sql ;
create table want as 
  select *
       , imput ne 'toto' as check1
       , max(imput ne 'toto') as check2
  from temp
  group by class 
  having check1 or not check2
  order by row
;
quit;

To do this with just a DATA step you will want add a "double DOW Loop" so you can calculate the overall flag for a group in the first loop and then process the individual rows in the second loop. Note this requires the input dataset is sorted (or at least grouped) by class.

data want;
  do until(last.class);
    set temp;
    by class notsorted;
    if imput ne 'toto' then check2=1;
  end;
  do until(last.class);
    set temp;
    by class notsorted;
    if imput ne 'toto' or not check2 then output;
  end;
  drop check2;
run;

Upvotes: 2

Abolfazl
Abolfazl

Reputation: 1710

I think this is what you want :

;WITH a AS (SELECT class
FROM data
GROUP BY class
HAVING MAX(imput)<>MIN(imput) 
)
DELETE t FROM a JOIN data t ON t.class = a.class
WHERE t.imput='toto'

Upvotes: 0

Related Questions