Reputation: 117
I have several .csv files in a folder with the same column names
I want to merge them and add the name of each file as the first column
I have tried
filenames <- list.files(getwd(), full.names = FALSE, pattern = ".csv", recursive = TRUE)
sites <- str_extract(filenames, ".csv") # same length as filenames
library(purrr)
library(dplyr)
library(readr)
stopifnot(length(filenames)==length(sites)) # returns error if not the same length
ans <- map2(filenames, sites, ~read_csv(.x) %>% mutate(id = .y)) # .x is element in filenames, and .y is element in sites
By this files are being joined but the first column is number rather than file names
like
> ans
[[1]]
# A tibble: 30,675 x 15
X1 CHROM POS REF ALT NORMAL TUMOR Depth T_REF_COUNT T_ALT_COUNT N_REF_COUNT
<dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 M 1.04e4 G A 3235:… 6860… 6874 1188 5646 3216
2 2 M 1.48e4 G A 3147:… 6504… 6584 6249 210 3128
I also tried this but says CHROM column are double (numeric and character)
customized_read_csv <- function(file){
read_csv(file) %>%
mutate(fileName = file)
}
list.files(full.names = TRUE) %>% # list all the files
lapply(customized_read_csv) %>% # read them all in with our custom function
reduce(bind_rows) %>% # stack them all on top of each other
select(CHROM, fileName, N_VAF) %>% # select the correct columns
pivot_wider(names_from = fileName, values_from = N_VAF) # and switch from "long format" to "wide format"
Error: Can't combine `..1$CHROM` <character> and `..13$CHROM` <double>.
Any help
EDITED
My files look
ans <- map2_df(filenames, sites, ~read_csv(.x) %>%
+ mutate(CHROM = as.character(CHROM), id = .y))
Error: Mapped vectors must have consistent lengths:
* `.x` has length 40
* `.y` has length 38
And final file looks
> head(ans)
# A tibble: 6 x 10
X1 CHROM POS REF ALT T_ALT_COUNT N_REF_COUNT N_ALT_COUNT id T_REF_COUNT
<dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 chrM 8620 C A 2161 607 1 LP200… NA
2 2 chr1 983023 C T 9 31 0 LP200… NA
3 3 chr1 1205584 T A 9 23 0 LP200… NA
4 4 chr1 1495120 T G 7 29 0 LP200… NA
5 5 chr1 1772044 C T 16 39 1 LP200… NA
6 6 chr1 2302194 G T 10 20 0 LP200… NA
>
How I can avoid this please?
Upvotes: 1
Views: 390
Reputation: 388982
You can use sub
to get the filename. It seems CHROM
column is read as numeric in certain files, we can convert it to character explicitly. Try :
library(dplyr)
library(purrr)
sites <- sub('\\.csv$', '', basename(filenames))
ans <- map2_df(filenames, sites, ~read_csv(.x) %>%
mutate(CHROM = as.character(CHROM), id = .y))
Upvotes: 2