Reputation: 11
I genotyped samples from methylation reads. I was surprised that many of the alternative alleles were degenerate bases: R or Y.
V00001.vcf ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NW_022882922.1 28895 . C T 0 PASS NS=1:DP=52 GT:GQ:DP 0/1:0:52 NW_022882922.1 36586 . C T,Y 0 PASS NS=1:DP=23:GU=T/C GT:GQ:DP 1/2:0:23 NW_022882922.1 36640 . G A 0 PASS NS=1:DP=40 GT:GQ:DP 1/1:0:40 NW_022882922.1 39071 . A G 0 PASS NS=1:DP=43 GT:GQ:DP 1/1:0:43
V0021 ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NW_022882922.1 25160 . G Y 0 PASS NS=1:DP=34:GU=T/C GT:GQ:DP 0/1:0:34 NW_022882922.1 25676 . T C 0 PASS NS=1:DP=41 GT:GQ:DP 0/1:0:41 NW_022882922.1 28342 . G A,R 0 PASS NS=1:DP=35:GU=A/G GT:GQ:DP 1/2:0:35 NW_022882922.1 29887 . C A 0 PASS NS=1:DP=48 GT:GQ:DP 0/1:0:48
One sample had way more degenerate bases: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NW_022882922.1 8082 . G A 0 PASS NS=1:DP=6 GT:GQ:DP 0/1:0:6 NW_022882922.1 11106 . T G 0 PASS NS=1:DP=19 GT:GQ:DP 0/1:0:19 NW_022882922.1 17828 . C G 0 PASS NS=1:DP=27 GT:GQ:DP 0/1:0:27 NW_022882922.1 25160 . G Y 0 PASS NS=1:DP=37:GU=T/C GT:GQ:DP 0/1:0:37 NW_022882922.1 27396 . G A,R 0 PASS NS=1:DP=33:GU=A/G GT:GQ:DP 1/2:0:33 NW_022882922.1 28342 . G A,R 0 PASS NS=1:DP=27:GU=A/G GT:GQ:DP 1/2:0:27 NW_022882922.1 28895 . C T 0 PASS NS=1:DP=32 GT:GQ:DP 0/1:0:32 NW_022882922.1 29887 . C A 0 PASS NS=1:DP=35 GT:GQ:DP 0/1:0:35 NW_022882922.1 40905 . T C,Y 0 PASS NS=1:DP=17:GU=T/C GT:GQ:DP 1/2:0:17 NW_022882922.1 43671 . A C 0 PASS NS=1:DP=11 GT:GQ:DP 0/1:0:11 NW_022882922.1 43859 . A T 0 PASS NS=1:DP=18 GT:GQ:DP 0/1:0:18 NW_022882922.1 46336 . G A,R 0 PASS NS=1:DP=26:GU=A/G GT:GQ:DP 1/2:0:26
When I try to combine them with GATK, I got an error because of them.
I have preprocessed my samples in two different ways.
My aims are: first, count and estimate the percentage of degenerate sites (with R/Y). I can count how many total sites there are with bcftools but I don't know how to count degenerate sites. Lastly, after knowing which preprocessing is better, I would like to filter those degenerate bases/sites to finally make my dataset.
Thanks;
I have tried filter with bcftools and vcftools but nothing for this is available.
Upvotes: 0
Views: 18