Angie
Angie

Reputation: 23

How to get a consensus of multiple sequence alignments using Biopython?

I am trying to get a consensus sequence from my multiple alignments files (fasta format).

I have a few fasta files each containing multiple sequence alignments. When I try to run this function below I get an AttributeError: 'generator' object has no attribute 'get_alignment_length'.

I haven't been able to find any code examples for this using AlignIO.parse, I only saw examples using AlignIO.read.

def get_consensus_seq(filename):

    alignments = (AlignIO.parse(filename,"fasta"))
    summary_align = AlignInfo.SummaryInfo(alignments)
    consensus_seq = summary_align.dumb_consensus(0.7,"N")
    print(consensus_seq)

Upvotes: 2

Views: 1560

Answers (1)

Vovin
Vovin

Reputation: 760

If I understand your situation right, the problem is the impossibility to get SummaryInfo from several alignments. They should be united into one.

from __future__ import annotations
from pathlib import Path
from itertools import chain

import Bio
from Bio import AlignIO
from Bio.Align import MultipleSeqAlignment
from Bio.Align.AlignInfo import SummaryInfo


SeqRecord = Bio.SeqRecord.SeqRecord


def get_consensus_seq(filename: Path | str) -> SeqRecord:
    common_alignment = MultipleSeqAlignment(
        chain(*AlignIO.parse(filename, "fasta"))
    )
    summary = SummaryInfo(common_alignment)
    consensus = summary.dumb_consensus(0.7, "N")
    return consensus

Upvotes: 4

Related Questions