How to select range of rows in R

Question

I have a dataframe called mydf. I also have a vector called myvec <- c("chr5:11", "chr3:112", "chr22:334"). What I want to do is select range (including 3 values above and 3 values below) of rows if any of the vector elements match the key in mydf and make a subset of mydf(result).

Since in the myvec we have chr5:11 matching with the key in mydf, we are selecting rows matching chr5:8 (three values below) to chr5:14 (three values above) in the result.

 mydf<- structure(list(key = structure(c(5L, 2L, 7L, 8L, 4L, 1L, 6L, 
3L, 11L, 10L, 9L), .Names = c("34", "35", "36", "37", "38", "39", 
"40", "41", "42", "43", "44"), .Label = c("chr5:10", "chr5:11", 
"chr5:1123", "chr5:118", "chr5:12", "chr5:123", "chr5:13", "chr5:14", 
"chr5:19", "chr5:8", "chr5:9"), class = "factor"), variantId = structure(1:11, .Names = c("34", 
"35", "36", "37", "38", "39", "40", "41", "42", "43", "44"), .Label = c("9920068", 
"9920069", "9920070", "9920071", "9920072", "9920073", "9920074", 
"9920075", "9920076", "9920077", "9920078"), class = "factor")), .Names = c("key", 
"variantId"), row.names = c("34", "35", "36", "37", "38", "39", 
"40", "41", "42", "43", "44"), class = "data.frame")

result

     key         variant
43 "chr5:8"    "9920077"
42 "chr5:9"    "9920076"
39 "chr5:10"   "9920073"
35 "chr5:11"   "9920069"
34 "chr5:12"   "9920068"
36 "chr5:13"   "9920070"
37 "chr5:14"   "9920071"

Ven Yao · Accepted Answer

You can use the GenomicRanges package.

library(GenomicRanges)

myvec <- c("chr5:11", "chr3:112", "chr22:334")
myvec.gr <- GRanges(gsub(":.+", "", myvec), 
                    IRanges(as.numeric(gsub(".+:", "", myvec))-3,
                            as.numeric(gsub(".+:", "", myvec)))+3)

mydf.gr <- GRanges(gsub(":.+", "", mydf[,"key"]), 
                   IRanges(as.numeric(gsub(".+:", "", mydf[,"key"])),
                           as.numeric(gsub(".+:", "", mydf[,"key"]))))

d.v.op <- findOverlaps(mydf.gr, myvec.gr)

mydf[queryHits(d.v.op), ]
#    key       variantId
# 34 "chr5:12" "9920068"
# 35 "chr5:11" "9920069"
# 36 "chr5:13" "9920070"
# 37 "chr5:14" "9920071"
# 39 "chr5:10" "9920073"
# 42 "chr5:9"  "9920076"
# 43 "chr5:8"  "9920077"

How to select range of rows in R

Answers (2)

Extra option for sorting

Related Questions