Reputation: 27
I have a vcf file that contains several SNPs and now I want to see, whether these SNPs are evenly distributed over the reads of the bam file from which I got the SNPs. Specifically, I want to plot the number of SNPs over read position. I am wondering whether there is some tool around for doing this or whether I have to write a script on my own. If so, is there a package in R with which I can do that (I am used to R but don't have much experience with perl)?
Upvotes: 1
Views: 1691
Reputation: 46856
Not sure what 'SNPs over read position' means, but you could read the VCF with the R / Bioconductor package and function VariantAnnotation::readVcf, and use the genomic coordinates to query the bam file with Rsamtools::countBam, using ScanBamParam
. Without testing, along the lines of
## first-time installation
source("http://bioconductor.org/biocLite.R")
biocLite(c("VariantAnnotation", "Rsamtools"))
to install the relevant packages, and then
library(VariantAnnotation) # also loads Rsamtools
snps = readVcf("/some/file.vcf")
param = ScanBamParam(which=rowData(vcf))
reads = countBam("/some/file.bam", param=param)
The best way to implement this might depend alot on how many SNPs you're interested in. I'd suggest that you use the pre-release R-2.15 alpha, as you will then get a more recent set of Bioconductor packages. These packages have extensive vignettes (vignette(package="VariantAnnotation")
and knowledgeable people on the Bioconductor mailing list, as well as the usual help pages ?readVcf
.
Upvotes: 2