Reputation: 25
I have a database of genes that are located in different chromosomes and positions. I also have a list of markers that have a certain position as well. What I want to do is to find the genes that are “around” the position of each marker. For example, I want to extract the genes that are +/- 50K of a given marker. Besides, I want to include in the output the information of the marker for every gene that I find.
This is what I have:
Genes:
gene chrom position
1_1 1 2164
1_2 1 11418
1_3 1 24840
1_4 1 63649
1_5 1 82098
1_6 1 110179
1_7 1 155165
1_8 1 186074
2_1 2 143076
2_2 2 148971
2_3 2 154134
2_4 2 165298
3_1 3 25612
3_2 3 65767
3_3 3 81952
3_4 3 111681
3_5 3 116253
Markers:
Marker chrom position
1 1 101054
2 1 155002
3 9 6073302
4 8 5297131
5 5 12294888
6 8 6269394
7 10 1313426
8 1 56156551
And this is what I want (sample):
Marker chrom position gene chrom position
1 1 101054 1_4 1 63649
1 1 101054 1_5 1 82098
1 1 101054 1_6 1 110179
2 1 155002 1_6 1 110179
2 1 155002 1_7 1 155165
2 1 155002 1_8 1 186074
This is my code so far:
marker<-read.table("markers.txt",sep="\t",header=T)
gene<-read.table("genes.txt",sep=""),sep="\t",header=T)
marker$low.lim<-marker$position-50000
marker$up.lim<-marker$position+50000
new<-gene[gene$chrom==marker$chrom[1] & gene$position %in% (marker$low.lim[1]:marker$up.lim[1]),]
I can't figure out how to make a loop with it. thanks
Upvotes: 0
Views: 48
Reputation: 3710
The R package GenomicRanges
is helpful to deal with genomic ranges.
g.txt <- "gene chrom position
1_1 1 2164
1_2 1 11418
1_3 1 24840
1_4 1 63649
1_5 1 82098
1_6 1 110179
1_7 1 155165
1_8 1 186074
2_1 2 143076
2_2 2 148971
2_3 2 154134
2_4 2 165298
3_1 3 25612
3_2 3 65767
3_3 3 81952
3_4 3 111681
3_5 3 116253"
m.txt <- "Marker chrom position
1 1 101054
2 1 155002
3 9 6073302
4 8 5297131
5 5 12294888
6 8 6269394
7 10 1313426
8 1 56156551"
genes <- read.table(text=g.txt, head=T, as.is=T)
mark <- read.table(text=m.txt, head=T, as.is=T)
library(GenomicRanges)
genes.gr <- GRanges(genes$chrom, IRanges(genes$position, genes$position))
mark.gr <- GRanges(mark$chrom, IRanges(mark$position-50000, mark$position+50000))
g.m.op <- findOverlaps(genes.gr, mark.gr)
cbind(mark[subjectHits(g.m.op), ], genes[queryHits(g.m.op), ])
# Marker chrom position gene chrom position
# 1 1 1 101054 1_4 1 63649
# 1.1 1 1 101054 1_5 1 82098
# 1.2 1 1 101054 1_6 1 110179
# 2 2 1 155002 1_6 1 110179
# 2.1 2 1 155002 1_7 1 155165
# 2.2 2 1 155002 1_8 1 186074
Upvotes: 1