Carlo
Carlo

Reputation: 155

How to merge files in a directory with r?

Good afternoon,

have a folder with 231 .csv files and I would like to merge them in R. Each file is one spectrum with 2 columns (Wavenumber and Reflectance), but as they come from the spectrometer they don't have colnames. So they look like this when I import them:

C_Sycamore = read.csv("@C_SC_1_10 average.CSV", header = FALSE)
head(C_Sycamore)

        V1            V2
1 399.1989 7.750676e+001
2 401.1274 7.779499e+001
3 403.0559 7.813432e+001
4 404.9844 7.837078e+001
5 406.9129 7.837600e+001
6 408.8414 7.822227e+001

The first column (Wavenumber) is identical in all 231 files and all spectra contain exactly 1869 rows. Therefore, it should be possible to merge the whole folder in one big dataframe, right? At least this would very practical for me.

So what I tried is this. I set the working directory to the according folder. Define an empty variable d. Store all the file names in file.list. And the loop through the names in the file.list. First, I want to change the colnames of every file to "Wavenumber" and "the according file name itself", so I use deparse(substitute(i)). Then, I want to read in the file and merge it with the others. And then I could probably do merge(d, read.csv(i, header = FALSE, by = "Wavenumber"), but I don't even get this far.

d = NULL
file.list = list.files()

for(i in file.list){
  colnames(i) = c("Wavenumber", deparse(substitute(i)))
  d = merge(d, read.csv(i, header = FALSE))
}

When I run this I get the error code

"Error in colnames<-(*tmp*, value = c("Wavenumber", deparse(substitute(i)))) :

So I tried running it without the "colnames()" line, which does not produce an error code, but doesn't work either. Instead of my desired dataframe I get am empty dataframe with only two columns and the message:

"reread"@S_BE_1_10 average.CSV" "@S_P_1_10 average.CSV""

This kind of programming is new to me. So I am thankful for all useful suggestions. Also I am happy to share more data if it helps.

Thanks in advance.

Upvotes: 2

Views: 3672

Answers (2)

Edo
Edo

Reputation: 7818

Solution

library(tidyr)
library(purrr)

path <- "your/path/to/folder"


# in one pipeline:
C_Sycamore  <- path %>% 
  
  # get csvs full paths. (?i) is for case insentitive
  list.files(pattern = "(?i)\\.csv$", full.names = TRUE) %>% 
  
  # create a named vector: you need it to assign ids in the next step.
  # and remove file extection to get clean colnames
  set_names(tools::file_path_sans_ext(basename(.))) %>% 

  # read file one by one, bind them in one df and create id column 
  map_dfr(read.csv, col.names = c("Wavenumber", "V2"), .id = "colname") %>%
  
  # pivot to create one column for each .id
  pivot_wider(names_from = colname, values_from = V2)

Explanation

I would suggest not to change the working directory. I think it's better if you read from that folder instead.

You can read each CSV file in a loop and bind them together by row. You can use map_dfr to loop over each item and then bind every dataframe by row (that's what the _dfr stands for).

Note that I've used .id = to create a new column called colname. It gets populated out of the names of the vector you're looping over. (That's why we added the names with set_names)

Then, to have one row for each Wavenumber, you need to reshape your data. You can use pivot_wider.

You will have at the end a dataframe with as many rows as Wavenumber and as many columns as the number of CSV plus 1 (the wavenumber column).


Reproducible example

To double check my results, you can use this reproducible example:

path <- tempdir()

csv <- "399.1989,7.750676e+001
401.1274,7.779499e+001
403.0559,7.813432e+001
404.9844,7.837078e+001
406.9129,7.837600e+001
408.8414,7.822227e+001"

write(csv, file.path(path, "file1.csv"))
write(csv, file.path(path, "file2.csv"))

You should expect this output:

C_Sycamore
#> # A tibble: 5 x 3
#>   Wavenumber file1 file2
#>        <dbl> <dbl> <dbl>
#> 1       401.  77.8  77.8
#> 2       403.  78.1  78.1
#> 3       405.  78.4  78.4
#> 4       407.  78.4  78.4
#> 5       409.  78.2  78.2

Thanks a lot to @Konrad Rudolph for the suggestions!!

Upvotes: 4

d3hero23
d3hero23

Reputation: 392

no need for a loop here simply use lapply.

first set your working directory to file location###

library(dplyr)
files_to_upload<-list.files(, pattern = "*.csv")
theData_list<-lapply(files_to_upload, read.csv)
C_Sycamore <-bind_rows(theData_list)

Upvotes: 0

Related Questions