user5034550
user5034550

Reputation:

Subsetting variables with missing values in R

I have a dataset with 50 variables (columns) and 30 of them have missing values more than half its own observations.

I want to subset a dataset where those 30 variables with too many missing values are gone. I think I can do it one by one, but I was just wondering if there could be a way to do it more quickly in R.

Upvotes: 1

Views: 295

Answers (1)

joel.wilson
joel.wilson

Reputation: 8413

Logic : First iterate through each column using sapply and check which all columns have less than half missing values. The output from first line is a logical vector which is used to subset the data.

ind <- sapply( colnames(df), function(x) sum(is.na(df[[x]])) < nrow(df)/2)
df <- df[colnames(df)[ind]]

Upvotes: 1

Related Questions