fabioln79
fabioln79

Reputation: 395

How to read from multiple FASTA files with R?

I have the following problem: I have 10 different FASTA files with thousand sequences inside each file. I would like to read from each fasta file all the sequence and then (with paste) create a one big file with all the sequences.

My question is the following: how can I read from different files in the same time?

I tried:

a<-list.files()

and then

for (x in a) { temp<-read.table(x) seq<-summary(temp) print (seq)

but it doesn't work properly. I tried also the command read.fasta but it gives to me a strange output (not all the sequence)

Thank you very much for your help, it will be very appreciate!

Fabio

PS. I started to work with R just one week ago...so please, be patient even if it is a stupid question!

Upvotes: 1

Views: 8760

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46866

Bioconductor has many packages for working with DNA sequences. Install the ShortRead package with

source("http://bioconductor.org/biocLite.R")
biocLite("ShortRead")

Load the library and consult the help page for readFasta

library(ShortRead)
?readFasta

Figure out a pattern (like list.files) that matches the fasta files you want to read in, and read all fasta files matching the pattern into a single object

patt <- "fasta$"
fasta <- readFasta("/my/directory/containing/fasta/files", patt)

Then write the object out

writeFasta(fasta, "my_destination.fasta")

But actually R would not be the right tool for just concatenating files; likely you want to do more interesting things, some of which might be described in the vignettes for ShortRead, Biostrings, and GenomicRanges

browseVignettes("ShortRead")
browseVignettes("Biostrings")
browseVignettes("GenomicRanges")

The Bioconductor mailing list is the best place to get support for Bioconductor packages.

Upvotes: 4

Related Questions