crisprheaven291
crisprheaven291

Reputation: 1

for loop with dplyr

I have a bunch of files I read in manually as such:

# gel above replicates

    A_gel <-read.delim("XL1_3_S35_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_gel <-read.delim("XL2_3_S37_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_gel <- read.delim("XL2_3_S37_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_gel <- read.delim("XL1_3_S35_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
# gel below replicates
    
    A_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")

I would like to change all the columns of these files and arrange by the start column with something like this:

colnames(A_gel) <- c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")
    
A_gel <- A_gel %>%
      arrange(A_gel$Start)

Instead, I would like to use a for loop for all files using R.

Upvotes: 0

Views: 100

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 546073

Never create multiple variables following the same pattern. The properly supported solution for this general problem is the use of lists (i.e. instead of having variables A_gel, B_gel, …, you have one variable gel, which is a list that contains your individual data.frames; you can also assign names to these individual items, though in your case that doesn’t seem necessary).

Then you can use e.g. lapply to run over your file paths and read the data of the different files into that list:

gel = lapply(gel_filenames, read.delim)
below_gel = lapply(below_gel_filenames, read.delim)

… and likewise you can put your arrangement code into a function and apply that, changing the above to:

read_bed = function (filename) {
    read.delim(filename) %>%
        setNames(c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")) %>%
        arrange(Start)
}

# …

gel = lapply(gel_filenames, read_bed)

Better yet, use purrr::map_dfr to read all data into a single combined table:

gel = gel_filenames %>%
    setNames(., .) %>%
    map_dfr(read_bed, .id = 'Filename')

(The setNames(., .) step is necessary since read_dfr assigns the names of the input vector to the added ID column.)

This will create one master table for the “GEL” dat, which has an added ID column for the original filename (you’ll probably want to extract just some ID from that, using tidyr::extract).

Upvotes: 4

Related Questions