Reputation: 95
I'm reading many large .csv files with identical column names and row-binding them using the following code (as suggested at https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R):
require(readr) # for read_csv()
require(purrr) # for map(), reduce()
# find all file names ending in .csv
files <- dir(pattern = "*.csv")
files
data <- files %>%
map(read_csv) %>% # read in all the files individually, using
# the function read_csv() from the readr package
reduce(rbind) # reduce with rbind into one dataframe
data
However, my data has one column that needs to be read in as.character, because it has entries of number strings separated by ",", and otherwise read_csv turns that column into numeric without the commas.
How can I
1.) Specify to read in just the one column (preferably by name) as.character?
or
2.) Simply read in all columns as.character?
This second option is not ideal, since then I have to change many columns back to numeric.
I tried using:
col_types = cols(.default = "c")
as discussed at https://github.com/tidyverse/readr/issues/148 and https://github.com/tidyverse/readr/issues/292.
My approach was this:
data <- files %>%
map(read_csv( col_types = cols(.default = "c" ))) %>%
reduce(rbind)
data
However, this doesn't work because then read_csv() wants an 'x' input (i.e. .csv file path). It throws this error:
Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
argument "file" is missing, with no default
Upvotes: 0
Views: 4568
Reputation: 95
Nine (or other number) columns with identical column names for each .csv file, only two columns (in this case "start_scan" and "end_scan") to be read as numeric, all the others as character:
files <- dir(pattern = "*.csv")
metadata <- files %>%
map_df(~read_csv(., col_types = cols(.default = "c",
scan_end = "n", scan_start = "n") ))
Upvotes: 2