imprela
imprela

Reputation: 83

delete columns in R for certain variables only

I have a dataframe x with column names that have a similar prefix (age_1, age_2, age_3,...,age_n). I want to delete columns with prefix 'age_' that are all NA. In the example below it will be age_2 and age_4. But in my dataset it might go up to age_100. Is there a way to do this?

Sample dataframe:

x <- data.frame("age_1" = c(0,1,1,0), "age_2" = NA , "age_3" = c(1,0,0,0), "age_4"=NA, "name_1" = NA, "name_2" = NA)

Sample output:

y <- data.frame("age_1" = c(0,1,1,0), "age_3" = c(1,0,0,0), "name_1" = NA, "name_2" = NA)

Upvotes: 0

Views: 87

Answers (3)

R. Schifini
R. Schifini

Reputation: 9313

You want to select by two conditions: the column name starts with age_ and all its elements are NA. You can select those columns by:

sel = grepl("^age_",colnames(x), ignore.case = T) & sapply(x, FUN = function(x){all(is.na(x))})

and then do:

new_x = x[,!sel]

Result

  age_1 age_3 name_1 name_2
1     0     1     NA     NA
2     1     0     NA     NA
3     1     0     NA     NA
4     0     0     NA     NA

The grepl(...) will be true only for those columns that start with age_

The sapply(...) will go through each column and will return true when all elements are NA (is.na()).

The intersection of both is what you need to leave out (sel negated with !).

Upvotes: 2

markus
markus

Reputation: 26373

Another option is to use colSums and grepl

x[, as.vector(!is.na(colSums(x))) | grepl(pattern = "name", names(x))]
#  age_1 age_3 name_1 name_2
#1     0     1     NA     NA
#2     1     0     NA     NA
#3     1     0     NA     NA
#4     0     0     NA     NA

Upvotes: 0

adl
adl

Reputation: 1441

y_1 <- x %>% select(which(!(names(.) %in% grep("^[age]", names(.), value = T))))
y_2 <- x[colSums(!is.na(x)) > 0]
y <- bind_cols(y_1,y_2)

Not elegant but works

Upvotes: 0

Related Questions