hope288
hope288

Reputation: 1435

Where statement searching for 2 values in any combination of multiple variables in SAS

I am trying to search for a double condition using Proc SQl. I want to search for when 'TTT' AND 'RRR' exist in v1,v2,v3,v4...v10 so it can be that TTT is in V1 and RRR is in v10 or in V1 and V2. There are many combinations but I don't want to type each combination out obviously. At the same time I want to also use an OR statement to search for variables from v1-v10 that either contain ('TTT' AND 'RRR') OR ('GGG') alone. I've been searching around, thought maybe a case when would work but I don't think so and also I need to make it a PROC SQL.

I know the below code is wrong as it's a longer version but, just to get the jist of what I mean:

WHERE date BETWEEN &start. AND &end. 
AND ( ((V1 = 'TTT' and V2 ='RRR') OR (V1 = 'GGG')) OR ((V1 = 'TTT' and V3 ='RRR') OR (V1 = 'GGG')) OR ((V1 = 'TTT' and V4 ='RRR') OR (V1 = 'GGG')) ...)

Thanks, Much Appreciated!


UPDATED version based on @Tom's Answer

 data
diag_table;
input v1 $ v2 $ v3 $ v4 $ v5 $;
cards;
TTT . . RRR .
GGG . . . .
. RRR . . TTT
. . . . .
FFF . . . .
. . RRR1 . .
TTT . . GGG .
. RRR . GGG .
run;
proc print data=diag_table;
quit;

proc sql;
create table diag_found  as
select *
from diag_table
WHERE (whichc('TTT',v1,v2,v3,v4,v5) and whichc('RRR',v1,v2,v3,v4,v5)) or (whichc('GGG',v1,v2,v3,v4,v5));
quit;
proc print data=diag_found;
quit;

enter image description here

The only problem with this code is that it's also grabbing cases where the rows contain GGG + RRR, and GGG + TTT I tried adding parentheses around the two groups but it didn't change anything.

UPDATE: @Tom and @Joe:

Yes, I guess I do mean an XOR with an AND inside.

So if A=TTT B=RRR C=GGG (A AND B) XOR C

Either A+B combination OR C

enter image description here

Thank you!!

Upvotes: 0

Views: 1001

Answers (3)

Tom
Tom

Reputation: 51566

You can use the WHICHC() function to do what you want.

where whichc('TTT',v1,v2) and whichc('RRR',v1,v2)

Note it is a little harder to use in SQL since you cannot use variable lists so you will need to explicitly list each variable name, whichc('TTT',v1,v2,v3,v4,v5), instead of just using a variable list,whichc('TTT', of v1-v5) , like you could in a data step.

Not sure what you mean by GGG alone. But if by that you mean GGG without TTT or RRR then you could use logic like this.

where (whichc('TTT',v1,v2) and whichc('RRR',v1,v2))
   or (whichc('GGG',v1,v) and not (whichc('TTT',v1,v2) or whichc('RRR',v1,v2)))

Upvotes: 4

Joe
Joe

Reputation: 63424

I don't think you need anything particularly complicated here; the key concept which you'll see in user2877959's answer as well as mine is to concatenate the strings into one long string, so you only need a single call (or in my case, three simple calls). Easier in data step than in sql, but it will work either way.

I use CATX with a delimiter to make sure "RR" | "RTTT" does not match, also. Then we just use FIND to find the string that matches.

data have;
  input (v1-v10) (:$3.);
  datalines;
AAA BBB CCC DDD EEE FFF GGG HHH III JJJ
KKK LLL MMM NNN OOO PPP QQQ RRR SSS TTT
UUU VVV WWW XXX YYY ZZZ AAA BBB CCC DDD
GGG TTT RRR AAA BBB CCC DDD EEE FFF HHH
;;;;
run;

proc sql;
  select * from have
      WHERE ( 
           find(catx('|',v1,v2,v3,v4,v5,v6,v7,v8,v9,v10),'RRR') and 
           find(catx('|',v1,v2,v3,v4,v5,v6,v7,v8,v9,v10),'TTT')
          ) ne
          (find(catx('|',v1,v2,v3,v4,v5,v6,v7,v8,v9,v10),'GGG') > 0
          )
    ;
quit;

Of course the view could process the WHERE clause if you wanted it to also.

I've modified the code to add the XOR combination; basically it either contains GGG or TTT+RRR but not GGG+TTT+RRR. This works by simply comparing the two boolean results (note I add >0 to the second to get it to evaluate to a true/false; the first will already evaluate to true/false thanks to and).

If you actually want GGG+RRR to be excluded, you'll have to add some additional criteria; you may simply be better off assigning the values of 'has RRR', 'has TTT', and 'has GGG' to three variables (either in a view, or in the PROC SQL SELECT query) and then evaluate those rather than having to do a bunch of find/whichn/etc.

Upvotes: 2

user2877959
user2877959

Reputation: 1792

You could try regular expressions. Something in the line of this:

where prxmatch('/(TTT.*RRR|RRR.*TTT)|GGG/',cats(v1,v2,v3,v4,v5,v6,v7,v8,v9,v10));

Upvotes: -2

Related Questions