hmhensen
hmhensen

Reputation: 3195

Find if any of values in group is in certain values in SAS

I'm trying to see if any elements within a grouped variable is in a set of values.

Either a SAS or PROC SQL solution will work.

Here's a sample dataset:

data input;/*table name*/
length ID 3 var1 $8;/*define char length*/
input ID var1 $;/*$ signifies char*/
datalines;/*start data*/
1 A
1 A
1 B
2 A
2 A
2 A
;/*end data*/
run;

ID  var1
1   A
1   A
1   B
2   A
2   A
2   A

I want to see if any of the values in var1 within each group ID equal "B" or "C". If one of them is either "B" or "C", then I want the output table to equal 1, else 0. In this example, because group ID = 1 has a "B" in var1, the output of var2 would be equal to 1. For group ID = 2, var2 should equal 0.

In R, I would use any() and this would be fairly straightforward. I know SAS has a similar function in SAS IML, but I'm not sure it can be used the same way. I'm looking for a relatively simple way to do this that avoids the use of do loops or subqueries.

Here's the desired output:

data output;/*table name*/
input ID var2;/*$ signifies char*/
datalines;/*start data*/
1 1
2 0
;/*end data*/
run;

ID  var2
1   1
2   0

I've found a few resources that are similar, but haven't found anything that hits directly on this question.

Similar SQL Question

Similar SAS Community Question 1

Similar SAS Community Question 2

SAS IML any()/all() Question

Upvotes: 0

Views: 1335

Answers (2)

Tom
Tom

Reputation: 51621

SAS evaluates boolean expressions to 1 (TRUE) or 0 (FALSE). So the MAX() over the group will indicate if ANY of them are TRUE.

data input;
  length ID 8 var1 $1 ;
  input ID var1 ;
datalines;
1 A
1 A
1 B
2 A
2 A
2 A
;

proc sql noprint;
create table want as
  select id
       , max( var1 in ('B' 'C') ) as var2
  from input
  group by id
;
quit;

proc print;
run;

PS: The risk involved in only keeping 3 of the 8 bytes needed to store numeric values is not worth any miniscule disk space saved. But for character variables you could save a lot of space by defining the length only as long as needed since character variables could be up to 32,767 bytes long. Also there is no value to adding the $ in the INPUT statement after the variable has already been defined as character. Once a variable's type has been defined you cannot change it anyway.

Upvotes: 2

Reeza
Reeza

Reputation: 21294

A few different concepts here:

  • Use FIRST to set the flag to 0 at the beginning of each ID.
  • Use IN to check for the list of values.
  • Use RETAIN to keep the value as it loops down the column.
  • Use Last to output the flag at the last record of each ID.
data output;
set input;
by ID;
if first.ID then var2=0;
retain var2;
if var1 in ('B', 'C') then var2=1;
if last.ID then output;
drop var1;
run;

Upvotes: 1

Related Questions