Reputation: 627
I have a piece of code that is reading in many dataframes and then rbinding them
data.files = paths %>% ##takes the names of all the objects that I want to read in
map(read.csv) %>% ##this reads all the correctly named .csv files into a list object
reduce(rbind) ##reduces them all from the list into a single dataframe by rbind
where paths
is a vector of names of .csv files to be read in. However the problem is that many of these objects are missing a single column LaserEnergy
which makes the rbind fail. This column is not important to my analysis and is a leftover of earlier data processing. Is there a way that I can go through and either remove the column from each object in the list that has that column or else add an empty column in the correct position to those that don't have it?
The alternative is my going through over 2000 files and either adding or removing the column manually.
Upvotes: 1
Views: 324
Reputation: 627
This is what ended up working for me, I had to use the data.table
package as well
data.files <- paths %>%
map(read._csv) %>%
rbindlist(fill = T) ##This function is from the data.tables package, fill = T tells it to fill missing columns with NA
For some reason, read_csv
did not like the column classes when used in conjunction with map_dfr()
and was trying to force columns into classes they shouldn't have been. I couldn't find anything in the documentation that would address it (trying to specify col_types
didn't work for me
Upvotes: 1
Reputation: 15072
Something like this? Without examples of data it's hard to tell what will work exactly, but using purrr::map_dfr
which is shorthand for map
then bind_rows
should avoid an error. bind_rows
will not throw an error if columns are not present in all list elements, it just fills with NA
. You can then drop the unwanted column from your resulting dataframe.
library(tidyverse)
data.files <- paths %>%
map_dfr(read_csv) %>%
select(-LaserEnergy)
Upvotes: 6