Reputation: 143
I have the following data set
col1 q1 q2 q3 q4 q5
a - 2 1 2 2
b - - - - -
b 1 2 - - -
d 3 - - - -
I want to apply a condition in SAS that allows me to delete the cases where all the columns (q1 through q5) are nulls. I would like to see this:
col1 q1 q2 q3 q4 q5
a - 2 1 2 2
b 1 2 - - -
d 3 - - - -
Upvotes: 2
Views: 1825
Reputation: 5452
The most elegant solution I've been able to find for this is:
data want;
array q[5];
set have;
if max(of q[*]) = . then delete;
run;
Step-by-step
data
statement sets up the output datasetarray
statement initialises a 5 element array named q
set
statement brings in the input dataset. Because the input dataset has variables named q1
...q5
, these are populated into the array.if
statement: in SAS any numeric value is greater than missing, so any case where the maximum is missing implies all values of q
are missing. The max(of q[*])
syntax simply brings back the maximum of all (* used to denote all) elements of the array q
. delete
removes the qualifying observations before they can be loaded to the output dataset.run
statement - data-step boundaryWithout Array
Thanks to SRSwift
data want;
set have;
if max(of q:) = . then delete;
run;
Using the n function
Thanks to Joe:
data want;
set have;
if n(of q:);
run;
As Joe explains in his comment, n
counts the non-missing values, and an if
statement followed by a condition only allows observations to pass into the output dataset if that condition is true (or a non-zero integer). In the case of all missing, the n
function returns zero, the condition is not met and the observation is removed.
To illustrate this, here is the input dataset again, with an n
column added:
col1 q1 q2 q3 q4 q5 n
a . 2 1 2 2 4
b . . . . . 0
b 1 2 . . . 2
d 3 . . . . 1
Upvotes: 5