Reputation: 627

remove a specific column from many dataframes stored in a list

I have a piece of code that is reading in many dataframes and then rbinding them

data.files = paths %>% ##takes the names of all the objects that I want to read in
  map(read.csv) %>% ##this reads all the correctly named .csv files into a list object
  reduce(rbind) ##reduces them all from the list into a single dataframe by rbind

where paths is a vector of names of .csv files to be read in. However the problem is that many of these objects are missing a single column LaserEnergy which makes the rbind fail. This column is not important to my analysis and is a leftover of earlier data processing. Is there a way that I can go through and either remove the column from each object in the list that has that column or else add an empty column in the correct position to those that don't have it?

The alternative is my going through over 2000 files and either adding or removing the column manually.

Upvotes: 1

Answers (2)

C. Denney

Reputation: 627

This is what ended up working for me, I had to use the data.table package as well

data.files <- paths %>%
   map(read._csv) %>%
   rbindlist(fill = T)  ##This function is from the data.tables package, fill = T tells it to fill missing columns with NA

For some reason, read_csv did not like the column classes when used in conjunction with map_dfr() and was trying to force columns into classes they shouldn't have been. I couldn't find anything in the documentation that would address it (trying to specify col_types didn't work for me

Upvotes: 1

Calum You

Reputation: 15072

Something like this? Without examples of data it's hard to tell what will work exactly, but using purrr::map_dfr which is shorthand for map then bind_rows should avoid an error. bind_rows will not throw an error if columns are not present in all list elements, it just fills with NA. You can then drop the unwanted column from your resulting dataframe.

library(tidyverse)
data.files <- paths %>%
    map_dfr(read_csv) %>%
    select(-LaserEnergy)

Upvotes: 6

remove a specific column from many dataframes stored in a list

Answers (2)

Related Questions