Reputation: 123
I have many amino acids sequences like in a fasta format that I did multiple sequence alignment. I was trying to plot something like binary code as heatmap. If it had a change it would be red, if it did not change would be yellow, for example.
I came across to msaplot
from ggtree
package. I also checked ggmsa
package for that. But so far, I did not get what I wanted.
So basically I wanted to:
change the multiple sequence alignment to a binary matrix (if the amino differs from the reference sequence, plot x, if not y)
plot as a heatmap
A multiple sequence alignment is something like this for those who don't know.
I know that I should provide some type of data example but I am not sure how to create an example of multiple sequence alignment
if you install ggmsa
you can have an example of the data and plot in r using:
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 265, end = 300)
Upvotes: 1
Views: 1373
Reputation: 46898
We read in the alignment:
library(Biostrings)
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
aln = readAAMultipleAlignment(protein_sequences)
ggmsa(protein_sequences, start = 265, end = 300)
Set the reference as the 1st sequence, some Rattus, you can also use the consensus with consensusString()
:
aln = unmasked(aln)
names(aln)[1]
[1] "PH4H_Rattus_norvegicus"
ref = aln[1]
Here we iterate through the sequence and make the binary for where the sequences are the same as the reference:
bm = sapply(1:length(aln),function(i){
as.numeric(as.matrix(aln[i])==as.matrix(ref))
})
bm = t(bm)
rownames(bm) = names(aln)
The plot you see above has sequences reversed, so to visualize the same thing we reverse it, and also subset on 265 - 300:
library(pheatmap)
pheatmap(bm[nrow(bm):1,265:300],cluster_rows=FALSE,cluster_cols=FALSE)
The last row, is Rattus, the reference, and anything similar to that is read, as you can see in the alignment above, last 4 sequences are identical.
Upvotes: 2