UUU
UUU

Reputation: 27

Extract read position from bam file

I have a vcf file that contains several SNPs and now I want to see, whether these SNPs are evenly distributed over the reads of the bam file from which I got the SNPs. Specifically, I want to plot the number of SNPs over read position. I am wondering whether there is some tool around for doing this or whether I have to write a script on my own. If so, is there a package in R with which I can do that (I am used to R but don't have much experience with perl)?

Upvotes: 1

Views: 1691

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46856

Not sure what 'SNPs over read position' means, but you could read the VCF with the R / Bioconductor package and function VariantAnnotation::readVcf, and use the genomic coordinates to query the bam file with Rsamtools::countBam, using ScanBamParam. Without testing, along the lines of

## first-time installation
source("http://bioconductor.org/biocLite.R")
biocLite(c("VariantAnnotation", "Rsamtools"))

to install the relevant packages, and then

library(VariantAnnotation) # also loads Rsamtools
snps = readVcf("/some/file.vcf")
param = ScanBamParam(which=rowData(vcf))
reads = countBam("/some/file.bam", param=param)

The best way to implement this might depend alot on how many SNPs you're interested in. I'd suggest that you use the pre-release R-2.15 alpha, as you will then get a more recent set of Bioconductor packages. These packages have extensive vignettes (vignette(package="VariantAnnotation") and knowledgeable people on the Bioconductor mailing list, as well as the usual help pages ?readVcf.

Upvotes: 2

Related Questions