Reputation:
I have a dataset with 50 variables (columns) and 30 of them have missing values more than half its own observations.
I want to subset a dataset where those 30 variables with too many missing values are gone. I think I can do it one by one, but I was just wondering if there could be a way to do it more quickly in R.
Upvotes: 1
Views: 295
Reputation: 8413
Logic : First iterate through each column using sapply
and check which all columns have less than half missing values. The output from first line is a logical vector which is used to subset the data.
ind <- sapply( colnames(df), function(x) sum(is.na(df[[x]])) < nrow(df)/2)
df <- df[colnames(df)[ind]]
Upvotes: 1