Loop for selecting values based on a column range

Question

I have a database of genes that are located in different chromosomes and positions. I also have a list of markers that have a certain position as well. What I want to do is to find the genes that are “around” the position of each marker. For example, I want to extract the genes that are +/- 50K of a given marker. Besides, I want to include in the output the information of the marker for every gene that I find.

This is what I have:

Genes:

gene    chrom   position
1_1 1   2164
1_2 1   11418
1_3 1   24840
1_4 1   63649
1_5 1   82098
1_6 1   110179
1_7 1   155165
1_8 1   186074
2_1 2   143076
2_2 2   148971
2_3 2   154134
2_4 2   165298
3_1 3   25612
3_2 3   65767
3_3 3   81952
3_4 3   111681
3_5 3   116253

Markers:

Marker  chrom   position
1   1   101054
2   1   155002
3   9   6073302
4   8   5297131
5   5   12294888
6   8   6269394
7   10  1313426
8   1   56156551

And this is what I want (sample):

Marker  chrom   position    gene    chrom   position
1   1   101054  1_4 1   63649
1   1   101054  1_5 1   82098
1   1   101054  1_6 1   110179
2   1   155002  1_6 1   110179
2   1   155002  1_7 1   155165
2   1   155002  1_8 1   186074

This is my code so far:

marker<-read.table("markers.txt",sep="	",header=T)
gene<-read.table("genes.txt",sep=""),sep="	",header=T)

marker$low.lim<-marker$position-50000
marker$up.lim<-marker$position+50000

new<-gene[gene$chrom==marker$chrom[1] & gene$position %in% (marker$low.lim[1]:marker$up.lim[1]),]

I can't figure out how to make a loop with it. thanks

Ven Yao · Accepted Answer

The R package GenomicRanges is helpful to deal with genomic ranges.

g.txt <- "gene    chrom   position
1_1 1   2164
1_2 1   11418
1_3 1   24840
1_4 1   63649
1_5 1   82098
1_6 1   110179
1_7 1   155165
1_8 1   186074
2_1 2   143076
2_2 2   148971
2_3 2   154134
2_4 2   165298
3_1 3   25612
3_2 3   65767
3_3 3   81952
3_4 3   111681
3_5 3   116253"

m.txt <- "Marker  chrom   position
1   1   101054
2   1   155002
3   9   6073302
4   8   5297131
5   5   12294888
6   8   6269394
7   10  1313426
8   1   56156551"

genes <- read.table(text=g.txt, head=T, as.is=T)
mark <- read.table(text=m.txt, head=T, as.is=T)

library(GenomicRanges)
genes.gr <- GRanges(genes$chrom, IRanges(genes$position, genes$position))
mark.gr <- GRanges(mark$chrom, IRanges(mark$position-50000, mark$position+50000))

g.m.op <- findOverlaps(genes.gr, mark.gr)   
cbind(mark[subjectHits(g.m.op), ], genes[queryHits(g.m.op), ])
#     Marker chrom position gene chrom position
# 1        1     1   101054  1_4     1    63649
# 1.1      1     1   101054  1_5     1    82098
# 1.2      1     1   101054  1_6     1   110179
# 2        2     1   155002  1_6     1   110179
# 2.1      2     1   155002  1_7     1   155165
# 2.2      2     1   155002  1_8     1   186074

Loop for selecting values based on a column range

Answers (1)

Related Questions