PesKchan
PesKchan

Reputation: 968

Average of multiple data frames in R having same column pattern

I have multiple data frames or files which i would like to get the average of columns in each data-frame and write them back.

The pattern across all my dataframes. So this is how all my files are named their column header.

names(WGCNA_avg_gene)
 [1] "Family"    "Symbol"    "C1_S1_S7"  "C1_S3_S9"  "C2_S1_S10" "C2_S2_S11" "C3_S1_S13" "C3_S2_S14" "C3_S3_S15"
[10] "C4_S1_S16" "C4_S2_S17" "C4_S3_S18" "C5_S1_S19" "C5_S2_S20" "C5_S3_S21" "C6_S1_S22" "C6_S2_S23" "C6_S3_S24"

Now so far what im doing is this way

WGCNA_avg_gene  <- e %>% mutate(C1 = rowMeans(.[grep("C1", names(.))]), 
                                C2 = rowMeans(.[grep("C2", names(.))]),
                                C3 = rowMeans(.[grep("C3", names(.))]),
                                C4 = rowMeans(.[grep("C4", names(.))]),
                                C5 = rowMeans(.[grep("C5", names(.))]),
                                C6 = rowMeans(.[grep("C6", names(.))]))

names(WGCNA_avg_gene)

one = WGCNA_avg_gene %>% select(Family,Symbol,C1,C2,C3,C4,C5,C6)
names(one)[2] = "Gene"

Im getting average of each data frame subletting it and then write back.

The steps which needs to be done after reading and what i understood is this.

But I can read the files and get a list but im not sure how to apply my above average which I'm calculating for individual data-frame into my list and do the same.

Any help would be really appreciated.

Upvotes: 1

Views: 188

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388797

List all the files that you want to read with list.files, use lapply to read each one divide them into different groups based on column name and take mean of each one of them.

list_of_files <- list.files('csv/folder/',pattern = '\\.csv$', full.names = TRUE)

lapply(list_of_files, function(x) {
    tmp <- read.csv(x)
    t1 <- tmp[-(1:2)]
    cbind(tmp[1:2], sapply(split.default(t1, 
          sub('_.*', '', names(t1))), rowMeans, na.rm = TRUE))
}) -> result

If you want to write the result back to a new dataframe for each file.

lapply(list_of_files, function(x) {
  tmp <- read.csv(x)
  t1 <- tmp[-(1:2)]
  result <- cbind(tmp[1:2], sapply(split.default(t1, 
                   sub('_.*', '', names(t1))), rowMeans, na.rm = TRUE))
  write.csv(result, paste0('result_', basename(x)), row.names = FALSE)
})

Upvotes: 2

Related Questions