Reputation: 7517
I have a data.frame
called DATA
. Using BASE R, I was wondering how I could remove any variables in DATA
that is named any of the following: ar = c("out", "Name", "mdif" , "stder" , "mpre")
?
Currently, I use DATA[ , !names(DATA) %in% ar]
but while this removes the unwanted variables, it again creates some new nuisance variables suffixed .1
.
After extraction, is it possible to remove just suffixes?
Note1: We have NO ACCESS to r
, the only input is DATA
.
Note2: This is toy data, a functional solution is appreciated.
r <- list(
data.frame(Name = rep("Jacob", 6),
X = c(2,2,1,1,NA, NA),
Y = c(1,1,1,2,1,NA),
Z = rep(3, 6),
out = rep(1, 6)),
data.frame(Name = rep("Jon", 6),
X = c(1,NA,3,1,NA,NA),
Y = c(1,1,1,2,NA,NA),
Z = rep(2, 6),
out = rep(1, 6)))
DATA <- do.call(cbind, r) ## DATA
ar = c("out", "Name", "mdif" , "stder" , "mpre") # The names for exclusion
DATA[ , !names(DATA) %in% ar] ## Current solution
#>
# X Y Z X.1 Y.1 Z.1 ## X.1 Y.1 Z.1 are automatically created but no needed
# 1 2 1 3 1 1 2
# 2 2 1 3 NA 1 2
# 3 1 1 3 3 1 2
# 4 1 2 3 1 2 2
# 5 NA 1 3 NA NA 2
# 6 NA NA 3 NA NA 2
Upvotes: 2
Views: 1614
Reputation: 886998
In base R
, if we create an object with the index, we can reuse it later instead of doing additional manipulations on the column name
i1 <- !names(DATA) %in% ar
DATA1 <- setNames(DATA[i1], names(DATA)[i1])
DATA1
# X Y Z X Y Z
#1 2 1 3 1 1 2
#2 2 1 3 NA 1 2
#3 1 1 3 3 1 2
#4 1 2 3 1 2 2
#5 NA 1 3 NA NA 2
#6 NA NA 3 NA NA 2
For reusuability, we can create a function
f1 <- function(dat, vec) {
i1 <- !names(dat) %in% vec
setNames(dat[i1], names(dat)[i1])
}
f1(DATA, ar)
If the datasets are stored in a list
, use lapply
to loop over the list
and apply the f1
lst1 <- list(DATA, DATA)
lapply(lst1, f1, vec = ar)
If the 'ar' elements are also different for different list
elements
arLst <- list(ar1, ar2)
Map(f1, lst1, vec = arLst)
Here,
ar1 <- c("out", "Name")
ar2 <- c("mdif" , "stder" , "mpre")
Here is also another option using tidyverse
library(dplyr)
library(stringr)
DATA %>%
set_names(make.unique(names(.))) %>%
select(-matches(str_c(ar, collapse="|"))) %>%
set_names(str_remove(names(.), "\\.\\d+$"))
# X Y Z X Y Z
#1 2 1 3 1 1 2
#2 2 1 3 NA 1 2
#3 1 1 3 3 1 2
#4 1 2 3 1 2 2
#5 NA 1 3 NA NA 2
#6 NA NA 3 NA NA 2
NOTE: It is not recommended to have duplicate column names
Upvotes: 2
Reputation: 388862
Ideally column names should be unique but if you want to keep duplicated column names, we can remove suffixes
using sub
after extraction
DATA1 <- DATA[ , !names(DATA) %in% ar]
names(DATA1) <- sub("\\.\\d+", "", names(DATA1))
DATA1
# X Y Z X Y Z
#1 2 1 3 1 1 2
#2 2 1 3 NA 1 2
#3 1 1 3 3 1 2
#4 1 2 3 1 2 2
#5 NA 1 3 NA NA 2
#6 NA NA 3 NA NA 2
Upvotes: 2