How to manipulate data.frame with position index and reconstruct desired one?

Question

I have data.frame that consist of two different index vector, query named as que, target named as subj, is the result of searching overlapping of interval data of three individual data.frame simultaneously as an input (consider aligning three interval set by parallel). However, inputDF is the result of searching overlapping interval. I want to construct this data.frame with position index in a special way, such as reduce the dimension of inputDF, regroup the index and rebuild new data.frame which geometrically show pairs of the overlapping index. Is there any way to manipulate inputDF and reconstruct my desired data.frame? Can anyone point me how to make this happen easily? Is there any efficient way to work with inputDF and build desired data.frame? Any idea?

Here is the visualization of interval aligning:

Here is the resulted example data.frame:

inputDF <- data.frame(
    que=c(5 , 7 , 8 , 9 ,14 ,16, 17 ,20 ,21, 22 , 8 , 9 ,16 ,22 , 2 ,12 ,15 ,18,
          21 , 4 , 3 , 7 ,15 ,21 ,13 ,19 , 4 , 5 , 6, 13, 14, 19 ,20, 2 , 3 ,12,
          18 , 6 , 5 ,11, 14, 20  ,8 ,16 ,22 , 9 ,17 , 1, 10 , 1 , 2 , 3, 11,12,
          18 , 1 ,10),
    subj=c( 5 , 7 , 8, 17 , 5 ,8 ,17 , 5 ,7 ,8, 22 ,22, 22, 22 , 2 ,2 ,15, 2,
            15  ,4  ,3 ,21 ,21 ,21 ,13 ,13 ,20 ,20 ,20 ,19 ,20 ,19 ,20 ,12 ,12 ,12,
            12 ,6 ,14 ,11 ,14 ,14 ,16 ,16 ,16 ,9  ,9  ,1  ,1 ,18 ,18 ,18 ,18 ,18, 18 ,10 ,10)
)

In order to build desired data.frame, I used NA to replace non-overlapped interval in subj_2;

This is my desired data.frame :

desiredDF <- data.frame(
    que=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22),
    self.subj=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22),
    subj_1=c(10,12,12,20,14,20,21,16,17,1,18,12,19,5,21,8,9,12,13,5,7,8),
    subj_2=c(18,18,18,NA,20,NA,NA,22,22,NA,NA,18,NA,20,NA,22,NA,18,NA,14,15,16)
)

Edit :

for example, these are interval data and how my desired data.frame constructed :

intDF <- list(
    bar=data.frame(start=c(8,18,33,53,69,81,105,115,135),
                   stop=c(14,21,39,61,73,87,111,120,153)),
    cat=data.frame(start=c(6,15,20,44,71,99,113,141),
                   stop=c(10,17,34,51,78,103,124,147)),
    foo=data.frame(start=c(11,43,57,101,117), 
                   stop=c(36,49,92,109,139))
)

intDF <- bind_rows(intDF)  # now it is easier to understand position index, such as `10`,`11` refers to 10th, 11th row in `intDF` and so on.

que self.sub subj1 subj2

1   1       10      18
2   2       12      18
3   3       12      18
4   4       20
5   5       14      20
6   6       20
7   7       21
8   8       16      22

How can I achieve my desired data.frame? Are there any efficient way to manipulate inputDFfor building desired data.frame?

jeremycg · Accepted Answer

We can do this using dplyr.

First we groupby your 'que', sort by 'subj', then set the columns to be the first and second subj which is not equal to the 'que':

library(dplyr)
inputDF %>%
 group_by(que) %>%
 arrange(subj) %>%
 summarise(self.sub = que[1], subj1 = subj[subj!=que][1], subj2 = subj[subj!=que][2])

Source: local data frame [22 x 4]

     que self.sub subj1 subj2
   (dbl)    (dbl) (dbl) (dbl)
1      1        1    10    18
2      2        2    12    18
3      3        3    12    18
4      4        4    20    NA
5      5        5    14    20
6      6        6    20    NA
7      7        7    21    NA
8      8        8    16    22
9      9        9    17    22
10    10       10     1    NA
..   ...      ...   ...   ...

In response to your edit, we can use the IRanges package:

library(IRanges)
myranges = IRanges(start = intDF$start, end = intDF$stop)
data = as.data.frame(findOverlaps(myranges))
data
   queryHits subjectHits
1          1          10
2          1           1
3          1          18
4          2          18
5          2           2
6          2          12
7          3          18
8          3          12
9          3           3
10         4           4
...       ...         ...

How to manipulate data.frame with position index and reconstruct desired one?

Answers (1)

Related Questions