user3683485
user3683485

Reputation: 155

How can I produce wig files from .sam files using a faster R script?

I have an R script in which I can read the lines from a .sam file after mapping and I want to parse lines of sam file into strings in order to be easier to manipulate them and create the wig files that I want or to calculate the cov3 and cov5 that I need. Can you help me please to make this script work faster? How can I parse lines of a huge .sam file into a data frame faster? Here is my script:

gc()
rm(list=ls()) 

exptPath <- "/home/dimitris/INDEX3PerfectUnique31cov5.sam"


lines <- readLines(exptPath)
pos = lines
pos
chrom = lines
chrom
pos = ""
chrom = ""
nn = length(lines)
nn

# parse lines of sam file into strings(this part is very very slow)
rr = strsplit(lines,"\t", fixed = TRUE)
rr
trr = do.call(rbind.data.frame, rr)
pos = as.numeric(as.character(trr[8:nn,4]))
# for cov3
#pos = pos+25
#pos
chrom = trr[8:nn,3]
pos = as.numeric(pos)
pos

tab1 = table(chrom,pos, exclude="")
tab1

ftab1 = as.data.frame(tab1)
ftab1 = subset(ftab1, ftab1[3] != 0)
ftab1 = subset(ftab1, ftab1[1] != "<NA>")
oftab1 = ftab1[ order(ftab1[,1]), ]
final.ftab1 = oftab1[,2:3]
write.table(final.ftab1, "ind3_cov5_wig.txt", row.names=FALSE,
            sep="   ", quote=FALSE)

Upvotes: 3

Views: 561

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46856

It's hard to provide a detailed answer without access to sample inputs and outputs (e.g., subsets of your data on dropbox). The Bioconductor solution would convert the sam file to bam

library(Rsamtools)
bam <- "/path/to/new.bam")
asBam("/path/to/old.sam", bam)

then read the data in, perhaps directly (see ?scanBam and ?ScanBamParam to import just the fields / regions of interest)

rr <- scanBam(bam)

or in the end more conveniently

library(GenomicAlignments)
aln <- readGAlignments(bam)
## maybe cvg <- coverage(bam) ?

There would be several steps to do your manipulations, ending with a GRanges object (sort of like a data.frame, but where the rows have genomic coordinates) or related object

## ...???
## gr <- GRanges(seqnames, IRanges(start, end), strand=..., score=...)

The end goal is to export to a wig / bigWig / bed file using

library(rtracklayer)
export(gr, "/path/to.wig")

There are extensive help resources, including package vignettes, man pages, and the Bioconductor mailing list

Upvotes: 1

Related Questions