Reputation: 371

Creating a function to remove columns with different names from a list of dataframes

I have many dataframes that contain the same data, except for a few column differences between them that I want to remove. Here's something similar to what I have:

df1 <- data.frame(X = c(1, 2, 3, 4, 5),
                  var1 = c('a', 'b', 'c', 'd', 'e'),
                  var2 = c(1, 1, 0, 0, 1))
df2 <- data.frame(X..x = c(1, 2, 3, 4, 5),
                  X..y = c(1, 2, 3, 4, 5),
                  var1 = c('f', 'g', 'h', 'i', 'j'),
                  var2 = c(0, 1, 0, 1, 1))
df_list <- list(df1=df1,df2=df2)

I am trying to create a function to remove the X, X..x, and X..y columns from each of the dataframes. Here's what I've tried with the given error:

remove_col <- function(df){
  df = subset(df, select = -c(X, X..x, X..y))
  return(df)
}
df_list <- lapply(df_list, remove_col)

#  Error in eval(substitute(select), nl, parent.frame()) : 
#  object 'X..x' not found

I'm running into problems because not all dataframes contain X, and similarly not all dataframes contain X..x and X..y. How can I update the function so that it can be applied to all dataframes in the list and successfully remove its given columns?

Using R version 3.5.1, Mac OS X 10.13.6

Upvotes: 0

Answers (3)

akrun

Reputation: 887008

Instead of checking each list element for the same column names, it can be automated if we can extract the intersecting column names across the list. Loop over the list, get the column names, find the intersecting elements with Reduce and use that to subset the columns

nm1 <- Reduce(intersect, lapply(df_list, names))
lapply(df_list, `[`, nm1)
#$df1
#  var1 var2
#1    a    1
#2    b    1
#3    c    0
#4    d    0
#5    e    1

#$df2
#  var1 var2
#1    f    0
#2    g    1
#3    h    0
#4    i    1
#5    j    1

Or with tidyverse

library(dplyr)
library(purrr)
map(df_list, names) %>%
     reduce(intersect) %>%
     map(df_list, select, .)

Upvotes: 0

Duck

Reputation: 39595

You can try:

#Function
remove_col <- function(df,name){
  vec <- which(names(df) %in% name)
  df = df[,-vec]
  return(df)
}
df_list <- lapply(df_list, remove_col,name=c('X', 'X..x', 'X..y'))

$df1
  var1 var2
1    a    1
2    b    1
3    c    0
4    d    0
5    e    1

$df2
  var1 var2
1    f    0
2    g    1
3    h    0
4    i    1
5    j    1

Upvotes: 2

Daniel O

Reputation: 4358

if you want to keep only the columns with "var"

lapply(df_list, function(x) x[grepl("var",colnames(x))])

or if you really just want those removed explecitly

lapply(df_list, function(x) x[!grepl("^X$|^X\\.\\.x$|^X\\.\\.y$",colnames(x))])
    
$df1
  var1 var2
1    a    1
2    b    1
3    c    0
4    d    0
5    e    1

$df2
  var1 var2
1    f    0
2    g    1
3    h    0
4    i    1
5    j    1

Upvotes: 0

Creating a function to remove columns with different names from a list of dataframes

Answers (3)

Related Questions