Reputation: 933
I am trying to compute the percentage of overlap between two datasets with genomic coordinates, satisfying certain criteria.
seg2
ID chrom loc.start loc.end num.mark seg.mean
AB 1 3010000 173490000 8430 0.0039
AB 1 173510000 173590000 5 -17.738
AB 1 173610000 173830000 12 0.011
AB 1 173850000 173970000 6 -16.121
AB 2 3090000 181990000 8434 0.011
BB 12 3090000 68990000 2950 -0.2022
BB 12 69010000 87790000 889 0.0267
BB 12 88010000 98550000 507 -0.3337
BB 12 98570000 115090000 800 0.0586
BB 12 115110000 119350000 197 -0.2031
BB 12 119370000 119430000 4 -20.671
over
chr start end CNA sample.ID
1 68580000 68640000 loss 1-68580000-68640000
3 15360000 16000000 loss 3-15360000-16000000
4 122660000 123500000 gain 4-122660000-123500000
7 48320000 48400000 loss 7-48320000-48400000
12 115860000 115980000 loss 12-115860000-115980000
12 113560000 114920000 gain 12-113560000-114920000
expected output
ID chrom loc.start loc.end num.mark seg.mean lm(percentage of overlap)
AB 1 3010000 173490000 8430 0.0039 %
AB 1 173510000 173590000 5 -17.738
AB 1 173610000 173830000 12 0.011
AB 1 173850000 173970000 6 -16.121
AB 2 3090000 181990000 8434 0.011
BB 12 3090000 68990000 2950 -0.2022
BB 12 69010000 87790000 889 0.0267
BB 12 88010000 98550000 507 -0.3337
BB 12 98570000 115090000 800 0.0586
BB 12 115110000 119350000 197 -0.2031
BB 12 119370000 119430000 4 -20.671
I tried this script, but it's not working.
for (i in 1:now(seg2)) {
seg2$lm <- if((seg2$chrom[i] == over$chr[i]) |
(seg2$loc.start[i] <= over$start[i] & seg2$loc.end[i] >= over$end[i]) |
(over$seg.mean[i] >= 0.459 & seg2$CNA[i] == "gain") |
(over$seg.mean[i] <= -0.678 & seg2$CNA[i] == "loss"),
(over$end[i]-over$start[i])/(seg2$loc.end[i]-seg2$loc.start[i])*100)
}
I am aware of the GenomicRanges package, but would be grateful for suggestions.
Upvotes: 2
Views: 1749
Reputation: 4472
I would strongly suggest you to use GenomicFeatures
to do this efficiently. If you are already aware of creating your own Granges
objects then you need to do following two steps to get the length of overlap
# to find overlaps
overlappin.index = findOverlaps(object1, object2)
# to get the overlap length
width(ranges(overlapping.index, ranges(object1),ranges(object2)))
Where, "object1" and "object2" are the GRanges
objects with coordinates, and "overlappin.index" is the indexes of the objects which are in overlap.
Once you have the length you can easily get the percentages.
Upvotes: 2