Removing columns based on a vector of names in R

Question

I have a data.frame called DATA. Using BASE R, I was wondering how I could remove any variables in DATA that is named any of the following: ar = c("out", "Name", "mdif" , "stder" , "mpre")?

Currently, I use DATA[ , !names(DATA) %in% ar] but while this removes the unwanted variables, it again creates some new nuisance variables suffixed .1.

After extraction, is it possible to remove just suffixes?

Note1: We have NO ACCESS to r, the only input is DATA.

Note2: This is toy data, a functional solution is appreciated.

r <- list(
 data.frame(Name = rep("Jacob", 6), 
           X = c(2,2,1,1,NA, NA), 
           Y = c(1,1,1,2,1,NA), 
           Z = rep(3, 6), 
         out = rep(1, 6)), 

 data.frame(Name = rep("Jon", 6), 
           X = c(1,NA,3,1,NA,NA), 
           Y = c(1,1,1,2,NA,NA), 
           Z = rep(2, 6), 
         out = rep(1, 6)))

DATA <- do.call(cbind, r)  ## DATA

ar = c("out", "Name", "mdif" , "stder" , "mpre") # The names for exclusion

DATA[ , !names(DATA) %in% ar]      ## Current solution
#>
#    X  Y Z X.1 Y.1 Z.1          ## X.1 Y.1 Z.1  are automatically created but no needed
# 1  2  1 3   1   1   2
# 2  2  1 3  NA   1   2
# 3  1  1 3   3   1   2
# 4  1  2 3   1   2   2
# 5 NA  1 3  NA  NA   2
# 6 NA NA 3  NA  NA   2

Ronak Shah · Accepted Answer

Ideally column names should be unique but if you want to keep duplicated column names, we can remove suffixes using sub after extraction

DATA1 <- DATA[ , !names(DATA) %in% ar] 
names(DATA1) <- sub("\.\d+", "", names(DATA1))

DATA1
#   X  Y Z  X  Y Z
#1  2  1 3  1  1 2
#2  2  1 3 NA  1 2
#3  1  1 3  3  1 2
#4  1  2 3  1  2 2
#5 NA  1 3 NA NA 2
#6 NA NA 3 NA NA 2

Removing columns based on a vector of names in R

Answers (2)

Related Questions