Reputation: 499
I want to delete ALL blank observations from a data set. I only know how to get rid of blanks from one variable:
data a;
set data(where=(var1 ne .)) ;
run;
Here I set a new data set without the blanks from var1. But how to do it, when I want to get rid of ALL the blanks in the whole data set?
Thanks in advance for your answers.
Upvotes: 10
Views: 65529
Reputation: 159
Really weird nobody provided this elegant answer:
if missing(cats(of _all_)) then delete;
Edit: indeed, I didn't realized the cats(of _all_)
returns a dot '.' for missing numeric value.
As a fix, I suggest this, which seems to be more reliable:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Apply the logic to delete blank records --*;
data test2;
set test;
*-- Build arrays of numeric and characters --*;
*-- Note: array can only contains variables of the same type, thus we must create 2 different arrays --*;
array nvars(*) _numeric_;
array cvars(*) _character_;
*-- Delete blank records --*;
*-- Blank record: # of missing num variables + # of missing char variables = # of numeric variables + # of char variables --*;
if nmiss(of _numeric_) + cmiss(of _character_) = dim(nvars) + dim(cvars) then delete;
run;
The main issue being if there is no numeric at all (or not char at all), the creation of an empty array will generate a WARNING and the call to nmiss/cmiss an ERROR.
So, I think so far there is not other option than building a SAS statement outside the data step to identify empty records:
*-- Building a sample dataset with test cases --*;
data test;
attrib a format=8.;
attrib b format=$8.;
a=.; b='a'; output;
a=1; b=''; output;
a=.; b=''; output; * should be deleted;
a=.a; b=''; output; * should be deleted;
a=.a; b='.'; output;
a=1; b='b'; output;
run;
*-- Create a SAS statement which test any missing variable, regardless of its type --*;
proc sql noprint;
select distinct 'missing(' || strip(name) || ')'
into :miss_stmt separated by ' and '
from dictionary.columns
where libname = 'WORK'
and memname = 'TEST'
;
quit;
/*
miss_stmt looks like missing(a) and missing(b)
*/
*-- Delete blank records --*;
data test2;
set test;
if &miss_stmt. then delete;
run;
Upvotes: 0
Reputation: 63424
If you are attempting to get rid of rows where ALL variables are missing, it's quite easy:
/* Create an example with some or all columns missing */
data have;
set sashelp.class;
if _N_ in (2,5,8,13) then do;
call missing(of _numeric_);
end;
if _N_ in (5,6,8,12) then do;
call missing(of _character_);
end;
run;
/* This is the answer */
data want;
set have;
if compress(cats(of _all_),'.')=' ' then delete;
run;
Instead of the compress you could also use OPTIONS MISSING=' ';
beforehand.
If you want to remove ALL Rows with ANY missing values, then you can use NMISS/CMISS functions.
data want;
set have;
if nmiss(of _numeric_) > 0 then delete;
run;
or
data want;
set have;
if nmiss(of _numeric_) + cmiss(of _character_) > 0 then delete;
run;
for all char+numeric variables.
Upvotes: 16
Reputation: 5470
You can do something like this:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then delete;
end;
drop i;
This will scan trough all the numeric variables and will delete the observation where it finds a missing value
Upvotes: 6
Reputation:
Here you go. This will work irrespective of the variable being character or numeric.
data withBlanks;
input a$ x y z;
datalines;
a 1 2 3
b 1 . 3
c . . 3
. . .
d . 2 3
e 1 . 3
f 1 2 3
;
run;
%macro removeRowsWithMissingVals(inDsn, outDsn, Exclusion);
/*Inputs:
inDsn: Input dataset with some or all columns missing for some or all rows
outDsn: Output dataset with some or all columns NOT missing for some or all rows
Exclusion: Should be one of {AND, OR}. AND will only exclude rows if any columns have missing values, OR will exclude only rows where all columns have missing values
*/
/*get a list of variables in the input dataset along with their types (i.e., whether they are numericor character type)*/
PROC CONTENTS DATA = &inDsn OUT = CONTENTS(keep = name type varnum);
RUN;
/*put each variable with its own comparison string in a seperate macro variable*/
data _null_;
set CONTENTS nobs = num_of_vars end = lastObs;
/*use NE. for numeric cols (type=1) and NE '' for char types*/
if type = 1 then call symputx(compress("var"!!varnum), compbl(name!!" NE . "));
else call symputx(compress("var"!!varnum), compbl(name!!" NE '' "));
/*make a note of no. of variables to check in the dataset*/
if lastObs then call symputx("no_of_obs", _n_);
run;
DATA &outDsn;
set &inDsn;
where
%do i =1 %to &no_of_obs.;
&&var&i.
%if &i < &no_of_obs. %then &Exclusion;
%end;
;
run;
%mend removeRowsWithMissingVals;
%removeRowsWithMissingVals(withBlanks, withOutBlanksAND, AND);
%removeRowsWithMissingVals(withBlanks, withOutBlanksOR, OR);
Outout of withOutBlanksAND:
a x y z
a 1 2 3
f 1 2 3
Output of withOutBlanksOR:
a x y z
a 1 2 3
b 1 . 3
c . . 3
e 1 . 3
f 1 2 3
Upvotes: 1