Reputation: 155
I am using the R package msa
, a core Bioconductor package, for multiple sequence alignment. Within msa
, I am using the MUSCLE alignment algorithm to align protein sequences.
library(msa)
myalign <- msa("test.fa", method=c("Muscle"), type="protein",verbose=FALSE)
The test.fa file is a standard fasta as follows (truncated, for brevity):
>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWRFLL
When I run the code on the file, I get:
MUSCLE 3.8.31
Call:
msa("test.fa", method = c("Muscle"), type = "protein", verbose = FALSE)
MsaAAMultipleAlignment with 2 rows and 480 columns
aln
[1] MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
[2] MSVVAIVKEGWLHKRGEYIKTWR---FLL
Con MS?VAIVKEGWLHKRGEYIKTWR???FLL
As you can see, a very reasonable alignment.
I want to write the gapped alignment, preferably without the consensus sequence (e.g., Con row), to a fasta file. So, I want:
>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWR---FLL
I checked the msa
help, and the package does not seem to have a built in method for writing out to any file type, fasta or otherwise.
The seqinr package looks somewhat promising, because maybe it could read this output as an msf format, albeit a weird one. However, seqinr
seems to need a file read in as a starting point. I can't even save this using write(myalign, ...)
.
Upvotes: 2
Views: 4774
Reputation: 1
You can convert your msa alignment first ("AAStringSet") into an "align" object first, and then export as fasta as follows:
library(msa)
library(bios2mds)
mysequences <-readAAStringSet("test.fa")
alignCW <- msa(mysequences)
#https://rdrr.io/bioc/msa/man/msaConvert.html
alignCW_as_align <- msaConvert(alignCW, "bios2mds::align")
export.fasta(alignCW_as_align, outfile = "test_alignment.fa", ncol = 60, open = "w")
Upvotes: 0
Reputation: 1
You may use export.fasta function from bio2mds library.
# reading of the multiple sequence alignment of human GPCRS in FASTA format:
aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
export.fasta(aln)
Upvotes: 0
Reputation: 5024
I wrote a function:
alignment2Fasta <- function(alignment, filename) {
sink(filename)
n <- length(rownames(alignment))
for(i in seq(1, n)) {
cat(paste0('>', rownames(alignment)[i]))
cat('\n')
the.sequence <- toString(unmasked(alignment)[[i]])
cat(the.sequence)
cat('\n')
}
sink(NULL)
}
Usage:
mySeqs <- readAAStringSet('test.fa')
myAlignment <- msa(mySeqs)
alignment2Fasta(myAlignment, 'out.fasta')
Upvotes: 5
Reputation: 263332
I think you ought to follow the examples in the help pages that show input with a specific read
function first, then work with the alignment:
mySeqs <- readAAStringSet("test.fa")
myAlignment <- msa(mySeqs)
Then the rownames
function will deliver the sequence names:
rownames(myAlignment)
[1] "sp|P31749|AKT1_HUMAN_RAC" "sp|P31799|AKT1_HUMAN_RAC"
(Not what you asked for but possibly useful in the future.) Then if you execute:
detail(myAlignment) #function actually in Biostrings
.... you get a text file in interactive mode that you can save
2 29
sp|P31749|AKT1_HUMAN_RAC MSDVAIVKEG WLHKRGEYIK TWRPRYFLL
sp|P31799|AKT1_HUMAN_RAC MSVVAIVKEG WLHKRGEYIK TWR---FLL
If you wnat to try hacking a function for which you can get a file written in code, then look at the Biostrings detail
function code that is being used
> showMethods( f= 'detail')
Function: detail (package Biostrings)
x="ANY"
x="MsaAAMultipleAlignment"
(inherited from: x="MultipleAlignment")
x="MultipleAlignment"
showMethods( f= 'detail', classes='MultipleAlignment', includeDefs=TRUE)
Function: detail (package Biostrings)
x="MultipleAlignment"
function (x, ...)
{
.local <- function (x, invertColMask = FALSE, hideMaskedCols = TRUE)
{
FH <- tempfile(pattern = "tmpFile", tmpdir = tempdir())
.write.MultAlign(x, FH, invertColMask = invertColMask,
showRowNames = TRUE, hideMaskedCols = hideMaskedCols)
file.show(FH)
}
.local(x, ...)
}
Upvotes: 2