CA_CA
CA_CA

Reputation: 143

Remove rows where fields contain null values

I have the following data set

col1 q1 q2 q3 q4 q5 
a    -  2  1  2  2
b    - -   -  -  -
b    1 2  -   -  - 
d    3 -  -  -   -

I want to apply a condition in SAS that allows me to delete the cases where all the columns (q1 through q5) are nulls. I would like to see this:

col1 q1 q2 q3 q4 q5 
a    -  2  1  2  2
b    1 2  -   -  - 
d    3 -  -  -   -

Upvotes: 2

Views: 1825

Answers (1)

mjsqu
mjsqu

Reputation: 5452

The most elegant solution I've been able to find for this is:

data want;
  array q[5];
  set have;
  if max(of q[*]) = . then delete;
run;

Step-by-step

  • data statement sets up the output dataset
  • array statement initialises a 5 element array named q
  • set statement brings in the input dataset. Because the input dataset has variables named q1...q5, these are populated into the array.
  • if statement: in SAS any numeric value is greater than missing, so any case where the maximum is missing implies all values of q are missing. The max(of q[*]) syntax simply brings back the maximum of all (* used to denote all) elements of the array q. delete removes the qualifying observations before they can be loaded to the output dataset.
  • run statement - data-step boundary

Without Array

Thanks to SRSwift

data want;
  set have;
  if max(of q:) = . then delete;
run;

Using the n function

Thanks to Joe:

data want;
  set have;
  if n(of q:);
run;

As Joe explains in his comment, n counts the non-missing values, and an if statement followed by a condition only allows observations to pass into the output dataset if that condition is true (or a non-zero integer). In the case of all missing, the n function returns zero, the condition is not met and the observation is removed.

To illustrate this, here is the input dataset again, with an n column added:

col1 q1 q2 q3 q4 q5 n 
a    .  2  1  2  2  4 
b    .  .  .  .  .  0 
b    1  2  .  .  .  2 
d    3  .  .  .  .  1 

Upvotes: 5

Related Questions