Reputation: 1
I have a bunch of files I read in manually as such:
# gel above replicates
A_gel <-read.delim("XL1_3_S35_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
B_gel <-read.delim("XL2_3_S37_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
C_gel <- read.delim("XL2_3_S37_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
D_gel <- read.delim("XL1_3_S35_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
# gel below replicates
A_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
B_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
C_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
D_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
I would like to change all the columns of these files and arrange by the start column with something like this:
colnames(A_gel) <- c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")
A_gel <- A_gel %>%
arrange(A_gel$Start)
Instead, I would like to use a for loop for all files using R.
Upvotes: 0
Views: 100
Reputation: 546073
Never create multiple variables following the same pattern. The properly supported solution for this general problem is the use of lists (i.e. instead of having variables A_gel
, B_gel
, …, you have one variable gel
, which is a list that contains your individual data.frame
s; you can also assign names to these individual items, though in your case that doesn’t seem necessary).
Then you can use e.g. lapply
to run over your file paths and read the data of the different files into that list:
gel = lapply(gel_filenames, read.delim)
below_gel = lapply(below_gel_filenames, read.delim)
… and likewise you can put your arrangement code into a function and apply that, changing the above to:
read_bed = function (filename) {
read.delim(filename) %>%
setNames(c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")) %>%
arrange(Start)
}
# …
gel = lapply(gel_filenames, read_bed)
Better yet, use purrr::map_dfr
to read all data into a single combined table:
gel = gel_filenames %>%
setNames(., .) %>%
map_dfr(read_bed, .id = 'Filename')
(The setNames(., .)
step is necessary since read_dfr
assigns the names of the input vector to the added ID column.)
This will create one master table for the “GEL” dat, which has an added ID column for the original filename (you’ll probably want to extract just some ID from that, using tidyr::extract
).
Upvotes: 4