Reputation: 2385
I have 96 files in file_list
file_list <- list.files(pattern = "*.mirna")
They all have the same columns, but the number of rows varies. Example file:
> head(test1)
seq name freq mir start end mism add t5 t3 s5 s3 DB
1 TGGAGTGTGATAATGGTGTTT seq_100003_x4 4 hsa-miR-122-5p 15 35 11TC 0 0 g GCTGTGGA TTTGTGTC miRNA
2 TGTAAACATCCCCGACCGGAAGCT seq_100045_x4 4 hsa-miR-30d-5p 6 29 17CT 0 0 CT TTGTTGTA GAAGCTGT miRNA
3 CTAGACTGAAGCTCCTTGAAAA seq_100048_x4 4 hsa-miR-151a-3p 47 65 0 I-AAA 0 gg CCTACTAG GAGGACAG miRNA
4 AGGCGGAGACTTGGGCAATTGC seq_100059_x4 4 hsa-miR-25-5p 14 35 0 0 0 C TGAGAGGC ATTGCTGG miRNA
5 AAACCGTTACCATTACTGAAT seq_100067_x4 4 hsa-miR-451a 17 35 0 I-AT 0 gtt AAGGAAAC AGTTTAGT miRNA
6 TGAGGTAGTAGCTTGTGCTGTT seq_10007_x24 24 hsa-let-7i-5p 6 27 12CT 0 0 0 TGGCTGAG TGTTGGTC miRNA
precursor ambiguity
1 hsa-mir-122 1
2 hsa-mir-30d 1
3 hsa-mir-151a 1
4 hsa-mir-25 1
5 hsa-mir-451a 1
6 hsa-let-7i 1
second file
> head(test2)
seq name freq mir start end mism add t5 t3 s5 s3 DB
1 ATTGCACTTGTCCTGGCCTGT seq_1000013_x1 1 hsa-miR-92a-3p 49 69 14TC 0 t 0 AAAGTATT CTGTGGAA miRNA
2 AAACCGTTACTATTACTGAGA seq_1000094_x1 1 hsa-miR-451a 17 36 11TC I-A 0 tt AAGGAAAC AGTTTAGT miRNA
3 TGAGGTAGCAGATTGTATAGTC seq_1000169_x1 1 hsa-let-7f-5p 8 28 9CT I-C 0 t GGGATGAG AGTTTTAG miRNA
4 TGGGTCTTTGCGGGCGAGAT seq_100019_x12 12 hsa-miR-193a-5p 21 40 0 0 0 ga GGGCTGGG ATGAGGGT miRNA
5 TGAGGTAGTAGATTGTATAGTG seq_100035_x12 12 hsa-let-7f-5p 8 28 0 I-G 0 t GGGATGAG AGTTTTAG miRNA
6 TGAAGTAGTAGGTTGTGTGGTAT seq_1000437_x1 1 hsa-let-7b-5p 6 26 4AG I-AT 0 t GGGGTGAG GGTTTCAG miRNA
precursor ambiguity
1 hsa-mir-92a-2 1
2 hsa-mir-451a 1
3 hsa-let-7f-2 1
4 hsa-mir-193a 1
5 hsa-let-7f-2 1
6 hsa-let-7b 1
I would like to create a unique ID consisting of the columns mir
and seq
:
hsa-miR-122-5p_TGGAGTGTGATAATGGTGTTT
Then I would like to merge all the 96 files based in this ID and take the column freq
form each file.
ID freq_file1 freq_file2 ...
hsa-miR-122-5p_TGGAGTGTGATAATGGTGTTT 4 12
If an ID is not pressent in a specific file the freq
should be NA
Upvotes: 2
Views: 73
Reputation: 886938
We can use Reduce
with merge
on a list
of data.frame
s.
lst <- lapply(mget(ls(pattern="test\\d+")),
function(x) subset(transform(x, ID=paste(precursor,
seq)), select=c("ID", "freq")))
Reduce(function(...) merge(..., by = "ID"), lst)
NOTE: In the above, I assumed that the "test1", "test2" objects are already created in the global environment by reading the files in 'file_list'. If not, we can directly read the files into a list
instead of creating additional data.frame objects i.e.
library(data.table)
lst <- lapply(file_list, function(x)
fread(x, select=c("precursor", "seq", "freq"))[,
list(ID=paste(precursor, seq), freq=freq)])
Reduce(function(x,y) x[y, on = "ID"], lst)
Or instead of fread
(from data.table
) use read.csv/read.table
and use merge
as before on 'lst'
Upvotes: 2