Reputation: 371
I have many dataframes that contain the same data, except for a few column differences between them that I want to remove. Here's something similar to what I have:
df1 <- data.frame(X = c(1, 2, 3, 4, 5),
var1 = c('a', 'b', 'c', 'd', 'e'),
var2 = c(1, 1, 0, 0, 1))
df2 <- data.frame(X..x = c(1, 2, 3, 4, 5),
X..y = c(1, 2, 3, 4, 5),
var1 = c('f', 'g', 'h', 'i', 'j'),
var2 = c(0, 1, 0, 1, 1))
df_list <- list(df1=df1,df2=df2)
I am trying to create a function to remove the X, X..x, and X..y columns from each of the dataframes. Here's what I've tried with the given error:
remove_col <- function(df){
df = subset(df, select = -c(X, X..x, X..y))
return(df)
}
df_list <- lapply(df_list, remove_col)
# Error in eval(substitute(select), nl, parent.frame()) :
# object 'X..x' not found
I'm running into problems because not all dataframes contain X, and similarly not all dataframes contain X..x and X..y. How can I update the function so that it can be applied to all dataframes in the list and successfully remove its given columns?
Using R version 3.5.1, Mac OS X 10.13.6
Upvotes: 0
Views: 73
Reputation: 887008
Instead of checking each list
element for the same column names, it can be automated if we can extract the intersec
ting column names across the list
. Loop over the list
, get the column names, find the intersec
ting elements with Reduce
and use that to subset the columns
nm1 <- Reduce(intersect, lapply(df_list, names))
lapply(df_list, `[`, nm1)
#$df1
# var1 var2
#1 a 1
#2 b 1
#3 c 0
#4 d 0
#5 e 1
#$df2
# var1 var2
#1 f 0
#2 g 1
#3 h 0
#4 i 1
#5 j 1
Or with tidyverse
library(dplyr)
library(purrr)
map(df_list, names) %>%
reduce(intersect) %>%
map(df_list, select, .)
Upvotes: 0
Reputation: 39595
You can try:
#Function
remove_col <- function(df,name){
vec <- which(names(df) %in% name)
df = df[,-vec]
return(df)
}
df_list <- lapply(df_list, remove_col,name=c('X', 'X..x', 'X..y'))
$df1
var1 var2
1 a 1
2 b 1
3 c 0
4 d 0
5 e 1
$df2
var1 var2
1 f 0
2 g 1
3 h 0
4 i 1
5 j 1
Upvotes: 2
Reputation: 4358
if you want to keep only the columns with "var"
lapply(df_list, function(x) x[grepl("var",colnames(x))])
or if you really just want those removed explecitly
lapply(df_list, function(x) x[!grepl("^X$|^X\\.\\.x$|^X\\.\\.y$",colnames(x))])
$df1
var1 var2
1 a 1
2 b 1
3 c 0
4 d 0
5 e 1
$df2
var1 var2
1 f 0
2 g 1
3 h 0
4 i 1
5 j 1
Upvotes: 0