Reputation: 3
I have a sas data-set with 5000 rows and 150 variables from a survey of 5000 respondents, but I need to remove the entire row/respondent where the column has missing observation for any of the 150 variables. So basically, I just need those respondents who have completed the answers for all 150 variables.
I am using proc sql or base sas, but I am not able to come up with a simpler way to do this. I have used conditional queries, but some columns are numeric and some are character type and I further need to do analysis on the numeric columns, so transposing doesnt seem to be an alternative.. Any help would be much appreciated?
Thanks
Upvotes: 0
Views: 5265
Reputation: 507
Taking Gordon Linoff's idea about Excel a step further with just SAS...
ods output SQL_Results=appliance;
proc sql number;
select * from sashelp.applianc;
quit;
data appliance_2;
set appliance;
if cmiss(of _all_) = 0;
run;
proc sql; create table que as select * from dictionary.columns where libname = "WORK" and memname = "APPLIANCE"; quit;
proc sql ;
select name, "IS NOT NULL AND"
from dictionary.columns where libname = "WORK" and memname = "APPLIANCE";
quit;
*copy / paste / clean-up ;
proc sql;
create table appliance_3 as
select * from appliance
where
Row IS NOT NULL AND
units_1 IS NOT NULL AND
units_2 IS NOT NULL AND
units_3 IS NOT NULL AND
units_4 IS NOT NULL AND
units_5 IS NOT NULL AND
units_6 IS NOT NULL AND
units_7 IS NOT NULL AND
units_8 IS NOT NULL AND
units_9 IS NOT NULL AND
units_10 IS NOT NULL AND
units_11 IS NOT NULL AND
units_12 IS NOT NULL AND
units_13 IS NOT NULL AND
units_14 IS NOT NULL AND
units_15 IS NOT NULL AND
units_16 IS NOT NULL AND
units_17 IS NOT NULL AND
units_18 IS NOT NULL AND
units_19 IS NOT NULL AND
units_20 IS NOT NULL AND
units_21 IS NOT NULL AND
units_22 IS NOT NULL AND
units_23 IS NOT NULL AND
units_24 IS NOT NULL AND
cycle IS NOT NULL
;quit;
Upvotes: 0
Reputation: 6378
With data step it is just:
data want;
set have;
if cmiss(of _all_) = 0;
run;
Will handle character and numeric variables.
Upvotes: 3
Reputation: 1270773
SAS procs tend to ignore missing values by removing the entire row from the data being analyzed. So, this might be less of a problem than you think. That is, if you are doing a forward selection logistic regression, add in a bunch of variables, then only rows with no missing values for those columns will be processed.
If you want to create a new data set where the columns have no missing values, you can do something like this:
proc sql;
create table t_nomissing
select t.*
from t
where col1 is not null and col2 is not null and col3 is not null and . . .
col150 is not null;
If you have a list of the column names, I would recommend creating the where
clause in a tool such as Excel where you can use formulas and copy them down.
Upvotes: 0