Reputation: 395
I have the following problem: I have 10 different FASTA files with thousand sequences inside each file. I would like to read from each fasta file all the sequence and then (with paste) create a one big file with all the sequences.
My question is the following: how can I read from different files in the same time?
I tried:
a<-list.files()
and then
for (x in a) { temp<-read.table(x) seq<-summary(temp) print (seq)
but it doesn't work properly. I tried also the command read.fasta but it gives to me a strange output (not all the sequence)
Thank you very much for your help, it will be very appreciate!
Fabio
PS. I started to work with R just one week ago...so please, be patient even if it is a stupid question!
Upvotes: 1
Views: 8760
Reputation: 46866
Bioconductor has many packages for working with DNA sequences. Install the ShortRead package with
source("http://bioconductor.org/biocLite.R")
biocLite("ShortRead")
Load the library and consult the help page for readFasta
library(ShortRead)
?readFasta
Figure out a pattern (like list.files
) that matches the fasta files you want to read in, and read all fasta files matching the pattern into a single object
patt <- "fasta$"
fasta <- readFasta("/my/directory/containing/fasta/files", patt)
Then write the object out
writeFasta(fasta, "my_destination.fasta")
But actually R would not be the right tool for just concatenating files; likely you want to do more interesting things, some of which might be described in the vignettes for ShortRead, Biostrings, and GenomicRanges
browseVignettes("ShortRead")
browseVignettes("Biostrings")
browseVignettes("GenomicRanges")
The Bioconductor mailing list is the best place to get support for Bioconductor packages.
Upvotes: 4