user2300940
user2300940

Reputation: 2385

Merge columnwise from file_list

I have 96 files in file_list

file_list <- list.files(pattern = "*.mirna")

They all have the same columns, but the number of rows varies. Example file:

> head(test1)
                       seq          name freq             mir start end mism   add t5  t3       s5       s3    DB
1    TGGAGTGTGATAATGGTGTTT seq_100003_x4    4  hsa-miR-122-5p    15  35 11TC     0  0   g GCTGTGGA TTTGTGTC miRNA
2 TGTAAACATCCCCGACCGGAAGCT seq_100045_x4    4  hsa-miR-30d-5p     6  29 17CT     0  0  CT TTGTTGTA GAAGCTGT miRNA
3   CTAGACTGAAGCTCCTTGAAAA seq_100048_x4    4 hsa-miR-151a-3p    47  65    0 I-AAA  0  gg CCTACTAG GAGGACAG miRNA
4   AGGCGGAGACTTGGGCAATTGC seq_100059_x4    4   hsa-miR-25-5p    14  35    0     0  0   C TGAGAGGC ATTGCTGG miRNA
5    AAACCGTTACCATTACTGAAT seq_100067_x4    4    hsa-miR-451a    17  35    0  I-AT  0 gtt AAGGAAAC AGTTTAGT miRNA
6   TGAGGTAGTAGCTTGTGCTGTT seq_10007_x24   24   hsa-let-7i-5p     6  27 12CT     0  0   0 TGGCTGAG TGTTGGTC miRNA
     precursor ambiguity
1  hsa-mir-122         1
2  hsa-mir-30d         1
3 hsa-mir-151a         1
4   hsa-mir-25         1
5 hsa-mir-451a         1
6   hsa-let-7i         1

second file

> head(test2)
                      seq           name freq             mir start end mism  add t5 t3       s5       s3    DB
1   ATTGCACTTGTCCTGGCCTGT seq_1000013_x1    1  hsa-miR-92a-3p    49  69 14TC    0  t  0 AAAGTATT CTGTGGAA miRNA
2   AAACCGTTACTATTACTGAGA seq_1000094_x1    1    hsa-miR-451a    17  36 11TC  I-A  0 tt AAGGAAAC AGTTTAGT miRNA
3  TGAGGTAGCAGATTGTATAGTC seq_1000169_x1    1   hsa-let-7f-5p     8  28  9CT  I-C  0  t GGGATGAG AGTTTTAG miRNA
4    TGGGTCTTTGCGGGCGAGAT seq_100019_x12   12 hsa-miR-193a-5p    21  40    0    0  0 ga GGGCTGGG ATGAGGGT miRNA
5  TGAGGTAGTAGATTGTATAGTG seq_100035_x12   12   hsa-let-7f-5p     8  28    0  I-G  0  t GGGATGAG AGTTTTAG miRNA
6 TGAAGTAGTAGGTTGTGTGGTAT seq_1000437_x1    1   hsa-let-7b-5p     6  26  4AG I-AT  0  t GGGGTGAG GGTTTCAG miRNA
      precursor ambiguity
1 hsa-mir-92a-2         1
2  hsa-mir-451a         1
3  hsa-let-7f-2         1
4  hsa-mir-193a         1
5  hsa-let-7f-2         1
6    hsa-let-7b         1

I would like to create a unique ID consisting of the columns mir and seq:

hsa-miR-122-5p_TGGAGTGTGATAATGGTGTTT

Then I would like to merge all the 96 files based in this ID and take the column freq form each file.

ID                                    freq_file1     freq_file2     ...
hsa-miR-122-5p_TGGAGTGTGATAATGGTGTTT  4           12

If an ID is not pressent in a specific file the freq should be NA

Upvotes: 2

Views: 73

Answers (1)

akrun
akrun

Reputation: 886938

We can use Reduce with merge on a list of data.frames.

lst <- lapply(mget(ls(pattern="test\\d+")), 
    function(x) subset(transform(x, ID=paste(precursor, 
            seq)), select=c("ID", "freq")))
Reduce(function(...) merge(..., by = "ID"), lst)

NOTE: In the above, I assumed that the "test1", "test2" objects are already created in the global environment by reading the files in 'file_list'. If not, we can directly read the files into a list instead of creating additional data.frame objects i.e.

 library(data.table)
 lst <- lapply(file_list, function(x) 
    fread(x, select=c("precursor", "seq", "freq"))[, 
        list(ID=paste(precursor, seq), freq=freq)])
 Reduce(function(x,y) x[y, on = "ID"], lst)

Or instead of fread (from data.table) use read.csv/read.table and use merge as before on 'lst'

Upvotes: 2

Related Questions