Reputation: 51
I'm trying to download a dataset in the BAM Format from GEO/SRA, that I can use for analysis in RStudio.
I tried using this method: where i downloaded .sra and converted it to .bam
prefetch GSM269238
sam-dump C:\Users\Desktop\sratoolkit.2.10.8-win64\bin\ncbi\SRA\sra\GSM2692389.sra --output-file GSM2692389.bam
However, in RStudio this didn't work, and returned an error, saying it couldn't read the bam file This is my R Code; I'm using RSamTools
> bamfiles <- list.files("directory redacted due to privacy", ".bam")
> file.exists(bamfiles)
[1] TRUE
>
>
> #---> Define bam files for count step on Rsamtools
>
> library("Rsamtools")
> bamfiles <- BamFileList(bamfiles, yieldSize=2000000)
> seqinfo(bamfiles)
Error in value[[3L]](cond) :
failed to open BamFile: SAM/BAM header missing or empty
file: 'GSM2692389.bam'
Does anyone know how to help me download the SRA data into readable .bam files? Any help or guidance would be much appreciated as I'm really trying to meet a deadline with this.
Upvotes: 1
Views: 8011
Reputation: 172
I'd say that your problem is caused by the fact that you don't actually have bam files ! Right now, your command is downloading sam files (hence the name sam-dump) and you're just saving these with a bam extension (a simple test would be to use head
on your "bam files". If you can read them, then they're not binary, which means they're not bam. Otherwise, you can use samtools view
, as bli suggested).
That being said, can you try this (make sure samtools is installed before using this) :
sam-dump C:\Users\Desktop\sratoolkit.2.10.8-win64\bin\ncbi\SRA\sra\GSM2692389.sra | samtools view -bS - > GSM2692389.bam
Also, if you're not particularly interested in downloading the .sra files, you might as well use this, which is easier and shorter (and maybe faster as well) :
sam-dump SRR5799988 | samtools view -bS - > GSM2692389.bam
I took the liberty of replacing your GSM number by the associated SRR number (see https://www.ncbi.nlm.nih.gov/sra?term=SRX2979455 ) but don't hesitate to double check the SRR !
More information on sam-dump : https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=sam-dump
Upvotes: 8