photosynthesis
photosynthesis

Reputation: 95

Need to read some (or all) columns as.character when combining multiple .csv files with tidyr functions

I'm reading many large .csv files with identical column names and row-binding them using the following code (as suggested at https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R):

require(readr)  # for read_csv()
require(purrr)  # for map(), reduce()

# find all file names ending in .csv 
files <- dir(pattern = "*.csv")
files

data <- files %>%
  map(read_csv) %>%    # read in all the files individually, using
                   # the function read_csv() from the readr package
  reduce(rbind)        # reduce with rbind into one dataframe
data

However, my data has one column that needs to be read in as.character, because it has entries of number strings separated by ",", and otherwise read_csv turns that column into numeric without the commas.

How can I

1.) Specify to read in just the one column (preferably by name) as.character?

or

2.) Simply read in all columns as.character?

This second option is not ideal, since then I have to change many columns back to numeric.

I tried using:

col_types = cols(.default = "c")

as discussed at https://github.com/tidyverse/readr/issues/148 and https://github.com/tidyverse/readr/issues/292.

My approach was this:

data <- files %>%
   map(read_csv( col_types = cols(.default = "c" ))) %>%
   reduce(rbind)   
data

However, this doesn't work because then read_csv() wants an 'x' input (i.e. .csv file path). It throws this error:

Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types,  : 
  argument "file" is missing, with no default

Upvotes: 0

Views: 4568

Answers (1)

photosynthesis
photosynthesis

Reputation: 95

Nine (or other number) columns with identical column names for each .csv file, only two columns (in this case "start_scan" and "end_scan") to be read as numeric, all the others as character:

files <- dir(pattern = "*.csv")

metadata <- files %>%
  map_df(~read_csv(., col_types = cols(.default = "c", 
    scan_end = "n", scan_start = "n") ))

Upvotes: 2

Related Questions