Reputation: 111
I've this data frame (which is the output of multibedintersect between 8 different Bed files of my ChIp-seq data):
head(Table,)
chrom start end num list
2 chr1 4491607 4493602 2 6,7
6 chr1 4571540 4571826 2 7,8
15 chr1 5019126 5020672 2 2,7
21 chr1 7139275 7139745 3 4,6,7
23 chr1 7398185 7398658 2 7,8
28 chr1 9745462 9745912 4 1,4,6,7
The column "list" is a character string that represents the presence of that particular peak in the list of my samples.
For example, the peak "2" is found in either sample number 6 and 7.
I want to count how many times every combination of 2 samples are found in the dataset, creating a table that summaries the information.
So basically multibedintersect gives back too many overlaps. I'm just interested in how the samples overlap with each-other 2 at the time.
For example, the samples 6 and 7 are found in either peak 2,21,28 and the samples 4 and 6 are found in the peaks 21 and 28
Via the package tydiverse, I'm able to address the issue for 1 sample at the time but I'm not able to "make it cycle" for every combination.
Table %>%
filter(str_detect(list, "6,7"))
In this way I get back anything that has that combination:
chrom start end num list
2 chr1 4491607 4493602 2 6,7
21 chr1 7139275 7139745 3 4,6,7
28 chr1 9745462 9745912 4 1,4,6,7
I think this is underperforming and very script intensive, as I would need to manually filter for every combination: To name a few:
Doing this "my way" would be something horrible like this:
Counts <- NULL
Pippo <- Table %>%
filter(str_detect(list, "7,8"))
Counts <- cbind(nrow(Pippo))
Pippo <- Table %>%
filter(str_detect(list, "6,8"))
Counts <- cbind(Counts, nrow(Pippo))
Pippo <- Table %>%
filter(str_detect(list, "5,8"))
Counts <- cbind(Counts, nrow(Pippo))
Pippo <- Table %>%
filter(str_detect(list, "4,8"))
Counts <- cbind(Counts, nrow(Pippo))
Pippo <- Table %>%
filter(str_detect(list, "3,8"))
Counts <- cbind(Counts, nrow(Pippo))
Pippo <- Table %>%
filter(str_detect(list, "2,8"))
Counts <- cbind(Counts, nrow(Pippo))
Pippo <- Table %>%
filter(str_detect(list, "1,8"))
Counts <- cbind(Counts, nrow(Pippo))
Would you please suggest me a better way to count every combination and create this data frame of summary?
Thanks a Lot
Upvotes: 1
Views: 69
Reputation: 107652
Consider base R with two sapply
calls: one with combn
to build all pair strings and then another with grepl
for subsetting data frame to retrieve row counts:
pairs <- sapply(combn(1:8, 2, simplify=FALSE), function(i) paste(i, collapse=","))
Counts <- sapply(pairs, function(i) nrow(subset(Table, grepl(i, `list`))))
Counts
# 1,2 1,3 1,4 1,5 1,6 1,7 1,8 2,3 2,4 2,5 2,6 2,7 2,8 3,4 3,5 3,6 3,7 3,8 4,5 4,6
# 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2
# 4,7 4,8 5,6 5,7 5,8 6,7 6,8 7,8
# 0 0 0 0 0 3 0 2
Alternatively, with a tidy version (dplyr
+ purrr
):
pairs <- combn(1:8, 2, simplify=FALSE) %>%
map(~(paste(., collapse=","))) %>%
unlist()
Counts <- pairs %>%
map(~(filter(Table, str_detect(list, .)) %>% nrow)) %>%
setNames(pairs) %>%
unlist()
Counts
# 1,2 1,3 1,4 1,5 1,6 1,7 1,8 2,3 2,4 2,5 2,6 2,7 2,8 3,4 3,5 3,6 3,7 3,8 4,5 4,6
# 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2
# 4,7 4,8 5,6 5,7 5,8 6,7 6,8 7,8
# 0 0 0 0 0 3 0 2
Upvotes: 1