Subsetting variables with missing values in R

Question

I have a dataset with 50 variables (columns) and 30 of them have missing values more than half its own observations.

I want to subset a dataset where those 30 variables with too many missing values are gone. I think I can do it one by one, but I was just wondering if there could be a way to do it more quickly in R.

joel.wilson · Accepted Answer

Logic : First iterate through each column using sapply and check which all columns have less than half missing values. The output from first line is a logical vector which is used to subset the data.

ind <- sapply( colnames(df), function(x) sum(is.na(df[[x]])) < nrow(df)/2)
df <- df[colnames(df)[ind]]

Subsetting variables with missing values in R

Answers (1)

Related Questions