Reputation: 155
Good afternoon,
have a folder with 231 .csv files and I would like to merge them in R. Each file is one spectrum with 2 columns (Wavenumber and Reflectance), but as they come from the spectrometer they don't have colnames. So they look like this when I import them:
C_Sycamore = read.csv("@C_SC_1_10 average.CSV", header = FALSE)
head(C_Sycamore)
V1 V2
1 399.1989 7.750676e+001
2 401.1274 7.779499e+001
3 403.0559 7.813432e+001
4 404.9844 7.837078e+001
5 406.9129 7.837600e+001
6 408.8414 7.822227e+001
The first column (Wavenumber) is identical in all 231 files and all spectra contain exactly 1869 rows. Therefore, it should be possible to merge the whole folder in one big dataframe, right? At least this would very practical for me.
So what I tried is this. I set the working directory to the according folder. Define an empty variable d. Store all the file names in file.list
. And the loop through the names in the file.list
. First, I want to change the colnames of every file to "Wavenumber" and "the according file name itself", so I use deparse(substitute(i))
. Then, I want to read in the file and merge it with the others. And then I could probably do merge(d, read.csv(i, header = FALSE, by = "Wavenumber")
, but I don't even get this far.
d = NULL
file.list = list.files()
for(i in file.list){
colnames(i) = c("Wavenumber", deparse(substitute(i)))
d = merge(d, read.csv(i, header = FALSE))
}
When I run this I get the error code
"Error in colnames<-
(*tmp*
, value = c("Wavenumber", deparse(substitute(i)))) :
So I tried running it without the "colnames()" line, which does not produce an error code, but doesn't work either. Instead of my desired dataframe I get am empty dataframe with only two columns and the message:
"reread"@S_BE_1_10 average.CSV" "@S_P_1_10 average.CSV""
This kind of programming is new to me. So I am thankful for all useful suggestions. Also I am happy to share more data if it helps.
Thanks in advance.
Upvotes: 2
Views: 3672
Reputation: 7818
library(tidyr)
library(purrr)
path <- "your/path/to/folder"
# in one pipeline:
C_Sycamore <- path %>%
# get csvs full paths. (?i) is for case insentitive
list.files(pattern = "(?i)\\.csv$", full.names = TRUE) %>%
# create a named vector: you need it to assign ids in the next step.
# and remove file extection to get clean colnames
set_names(tools::file_path_sans_ext(basename(.))) %>%
# read file one by one, bind them in one df and create id column
map_dfr(read.csv, col.names = c("Wavenumber", "V2"), .id = "colname") %>%
# pivot to create one column for each .id
pivot_wider(names_from = colname, values_from = V2)
I would suggest not to change the working directory. I think it's better if you read from that folder instead.
You can read each CSV file in a loop and bind them together by row. You can use map_dfr
to loop over each item and then bind every dataframe by row (that's what the _dfr
stands for).
Note that I've used .id =
to create a new column called colname
. It gets populated out of the names of the vector you're looping over. (That's why we added the names with set_names
)
Then, to have one row for each Wavenumber
, you need to reshape your data. You can use pivot_wider
.
You will have at the end a dataframe with as many rows as Wavenumber and as many columns as the number of CSV plus 1 (the wavenumber column).
To double check my results, you can use this reproducible example:
path <- tempdir()
csv <- "399.1989,7.750676e+001
401.1274,7.779499e+001
403.0559,7.813432e+001
404.9844,7.837078e+001
406.9129,7.837600e+001
408.8414,7.822227e+001"
write(csv, file.path(path, "file1.csv"))
write(csv, file.path(path, "file2.csv"))
You should expect this output:
C_Sycamore
#> # A tibble: 5 x 3
#> Wavenumber file1 file2
#> <dbl> <dbl> <dbl>
#> 1 401. 77.8 77.8
#> 2 403. 78.1 78.1
#> 3 405. 78.4 78.4
#> 4 407. 78.4 78.4
#> 5 409. 78.2 78.2
Thanks a lot to @Konrad Rudolph for the suggestions!!
Upvotes: 4
Reputation: 392
no need for a loop here simply use lapply.
library(dplyr)
files_to_upload<-list.files(, pattern = "*.csv")
theData_list<-lapply(files_to_upload, read.csv)
C_Sycamore <-bind_rows(theData_list)
Upvotes: 0