Reputation: 3195
I'm trying to see if any elements within a grouped variable is in a set of values.
Either a SAS or PROC SQL solution will work.
Here's a sample dataset:
data input;/*table name*/
length ID 3 var1 $8;/*define char length*/
input ID var1 $;/*$ signifies char*/
datalines;/*start data*/
1 A
1 A
1 B
2 A
2 A
2 A
;/*end data*/
run;
ID var1
1 A
1 A
1 B
2 A
2 A
2 A
I want to see if any of the values in var1
within each group ID
equal "B" or "C". If one of them is either "B" or "C", then I want the output table to equal 1, else 0. In this example, because group ID = 1 has a "B" in var1
, the output of var2
would be equal to 1. For group ID = 2, var2
should equal 0.
In R, I would use any()
and this would be fairly straightforward. I know SAS has a similar function in SAS IML, but I'm not sure it can be used the same way. I'm looking for a relatively simple way to do this that avoids the use of do
loops or subqueries.
Here's the desired output:
data output;/*table name*/
input ID var2;/*$ signifies char*/
datalines;/*start data*/
1 1
2 0
;/*end data*/
run;
ID var2
1 1
2 0
I've found a few resources that are similar, but haven't found anything that hits directly on this question.
Similar SAS Community Question 1
Similar SAS Community Question 2
Upvotes: 0
Views: 1335
Reputation: 51621
SAS evaluates boolean expressions to 1 (TRUE) or 0 (FALSE). So the MAX() over the group will indicate if ANY of them are TRUE.
data input;
length ID 8 var1 $1 ;
input ID var1 ;
datalines;
1 A
1 A
1 B
2 A
2 A
2 A
;
proc sql noprint;
create table want as
select id
, max( var1 in ('B' 'C') ) as var2
from input
group by id
;
quit;
proc print;
run;
PS: The risk involved in only keeping 3 of the 8 bytes needed to store numeric values is not worth any miniscule disk space saved. But for character variables you could save a lot of space by defining the length only as long as needed since character variables could be up to 32,767 bytes long. Also there is no value to adding the $ in the INPUT statement after the variable has already been defined as character. Once a variable's type has been defined you cannot change it anyway.
Upvotes: 2
Reputation: 21294
A few different concepts here:
data output;
set input;
by ID;
if first.ID then var2=0;
retain var2;
if var1 in ('B', 'C') then var2=1;
if last.ID then output;
drop var1;
run;
Upvotes: 1