Mussa
Mussa

Reputation: 117

Adding file name column to table as multiple files are read and merged

I have several .csv files in a folder with the same column names

I want to merge them and add the name of each file as the first column

I have tried

filenames <- list.files(getwd(), full.names = FALSE, pattern = ".csv", recursive = TRUE)
sites <- str_extract(filenames, ".csv")  # same length as filenames

library(purrr)
library(dplyr)
library(readr)
stopifnot(length(filenames)==length(sites))  # returns error if not the same length
ans <- map2(filenames, sites, ~read_csv(.x) %>% mutate(id = .y))  # .x is element in filenames, and .y is element in sites

By this files are being joined but the first column is number rather than file names

like

> ans
[[1]]
# A tibble: 30,675 x 15
      X1 CHROM    POS REF   ALT   NORMAL TUMOR Depth T_REF_COUNT T_ALT_COUNT N_REF_COUNT
   <dbl> <chr>  <dbl> <chr> <chr> <chr>  <chr> <dbl>       <dbl>       <dbl>       <dbl>
 1     1 M     1.04e4 G     A     3235:… 6860…  6874        1188        5646        3216
 2     2 M     1.48e4 G     A     3147:… 6504…  6584        6249         210        3128

I also tried this but says CHROM column are double (numeric and character)

customized_read_csv <- function(file){
    read_csv(file) %>%
        mutate(fileName = file)
}

list.files(full.names = TRUE) %>% # list all the files
    lapply(customized_read_csv) %>% # read them all in with our custom function
    reduce(bind_rows) %>% # stack them all on top of each other
    select(CHROM, fileName, N_VAF) %>% # select the correct columns
    pivot_wider(names_from = fileName, values_from = N_VAF) # and switch from "long format" to "wide format"

Error: Can't combine `..1$CHROM` <character> and `..13$CHROM` <double>.

Any help

EDITED

My files look

my file

 ans <- map2_df(filenames, sites, ~read_csv(.x) %>% 
    +                    mutate(CHROM = as.character(CHROM), id = .y))
    Error: Mapped vectors must have consistent lengths:
    * `.x` has length 40
    * `.y` has length 38

And final file looks

> head(ans)
# A tibble: 6 x 10
     X1 CHROM     POS REF   ALT   T_ALT_COUNT N_REF_COUNT N_ALT_COUNT id     T_REF_COUNT
  <dbl> <chr>   <dbl> <chr> <chr>       <dbl>       <dbl>       <dbl> <chr>        <dbl>
1     1 chrM     8620 C     A            2161         607           1 LP200…          NA
2     2 chr1   983023 C     T               9          31           0 LP200…          NA
3     3 chr1  1205584 T     A               9          23           0 LP200…          NA
4     4 chr1  1495120 T     G               7          29           0 LP200…          NA
5     5 chr1  1772044 C     T              16          39           1 LP200…          NA
6     6 chr1  2302194 G     T              10          20           0 LP200…          NA
> 

How I can avoid this please?

Upvotes: 1

Views: 390

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

You can use sub to get the filename. It seems CHROM column is read as numeric in certain files, we can convert it to character explicitly. Try :

library(dplyr)
library(purrr)

sites <- sub('\\.csv$', '', basename(filenames))

ans <- map2_df(filenames, sites, ~read_csv(.x) %>% 
                                   mutate(CHROM = as.character(CHROM), id = .y))

Upvotes: 2

Related Questions