Reputation: 83
I have a dataframe x with column names that have a similar prefix (age_1, age_2, age_3,...,age_n). I want to delete columns with prefix 'age_' that are all NA. In the example below it will be age_2 and age_4. But in my dataset it might go up to age_100. Is there a way to do this?
Sample dataframe:
x <- data.frame("age_1" = c(0,1,1,0), "age_2" = NA , "age_3" = c(1,0,0,0), "age_4"=NA, "name_1" = NA, "name_2" = NA)
Sample output:
y <- data.frame("age_1" = c(0,1,1,0), "age_3" = c(1,0,0,0), "name_1" = NA, "name_2" = NA)
Upvotes: 0
Views: 87
Reputation: 9313
You want to select by two conditions: the column name starts with age_
and all its elements are NA. You can select those columns by:
sel = grepl("^age_",colnames(x), ignore.case = T) & sapply(x, FUN = function(x){all(is.na(x))})
and then do:
new_x = x[,!sel]
Result
age_1 age_3 name_1 name_2
1 0 1 NA NA
2 1 0 NA NA
3 1 0 NA NA
4 0 0 NA NA
The grepl(...)
will be true only for those columns that start with age_
The sapply(...)
will go through each column and will return true when all
elements are NA (is.na()
).
The intersection of both is what you need to leave out (sel
negated with !
).
Upvotes: 2
Reputation: 26373
Another option is to use colSums
and grepl
x[, as.vector(!is.na(colSums(x))) | grepl(pattern = "name", names(x))]
# age_1 age_3 name_1 name_2
#1 0 1 NA NA
#2 1 0 NA NA
#3 1 0 NA NA
#4 0 0 NA NA
Upvotes: 0
Reputation: 1441
y_1 <- x %>% select(which(!(names(.) %in% grep("^[age]", names(.), value = T))))
y_2 <- x[colSums(!is.na(x)) > 0]
y <- bind_cols(y_1,y_2)
Not elegant but works
Upvotes: 0