Reputation: 443

Make a consensus tree from several tree using Bio.phylo

I'm interested in 4 housekeeping genes in enterobacter bacteria genome's.

So I have my housekeeping genes, I made a blast on NR and download the aligned sequences.

I made phylogenetic trees using MEGA7 software with Maximum Likehood method. Boostrap method was performed with 200 iterations.

I export my trees as newick files.

So now, I have 4 trees for my 4 housekeeping genes. I want to create a consensus tree of my 4 trees.

Personally I tried to use consensus tree from Bio.Phylo (http://biopython.org/DIST/docs/api/Bio.Phylo.Consensus-module.html#strict_consensus) (http://biopython.org/wiki/Phylo). I choose the majority_consensus function and it's work pretty well. But I have an issue.

My "script" is like that :

import os

import sys

from Bio import Phylo

from Bio.Phylo.Consensus import *

fichier=sys.argv[1]

fichier2=sys.argv[2]

fichier3=sys.argv[3]

fichier4=sys.argv[4]

tree1=Phylo.read(fichier, 'newick')

tree2=Phylo.read(fichier2, 'newick')

tree3=Phylo.read(fichier3, 'newick')

tree4=Phylo.read(fichier4, 'newick')

trees=tree1,tree2,tree3,tree4

majority_tree = majority_consensus(trees, 0.5)

Phylo.draw(majority_tree)

The problem is that the consensus tree is dependant on the order. I had different result for exemple when I try trees = tree1,tree2,tree3,tree4 and trees = tree2,tree4,tree1,tree3

Does someone know another software to make consensus tree from newick files?

And I need help with Bio.Phylo. If someone know more about this package it will be great.

Upvotes: 2

Answers (1)

BioGeek

Reputation: 22887

Since you didn't post your newick files, let's try to reproduce your problem so that we have a Minimal, Complete, and Verifiable example that shows that consensus tree is dependant on the order.

We start with the following three trees:

These are represented as follows in newick format:

newicks = {1: '((A,B,C),(D,(E,F)))',
           2: '(((A,B),C),(D,(E,F)))',
           3: '((A,B,C),(E,(D,F)))'}

Now we try all possible permutation of these three trees, create the consensus tree and see if they are all the same:

from io import StringIO
from Bio import Phylo
from Bio.Phylo.Consensus import majority_consensus
from itertools import permutations

def read_newick(treedata):
    handle = StringIO(treedata)
    return Phylo.read(handle, "newick")

for keys in permutations(newicks.keys()):
    trees = [read_newick(newicks[key]) for key in keys]
    majority_tree = majority_consensus(trees, 0.5)
    print('majority consensus for order: {}'.format(keys))
    Phylo.draw_ascii(majority_tree)

Result:

majority consensus for order: (1, 2, 3)
                           ________________________ A
                          |
  ________________________|________________________ B
 |                        |
_|                        |________________________ C
 |
 |                         ________________________ D
 |________________________|
                          |                         ________________________ E
                          |________________________|
                                                   |________________________ F

majority consensus for order: (1, 3, 2)
                           ________________________ A
                          |
  ________________________|________________________ B
 |                        |
_|                        |________________________ C
 |
 |                         ________________________ D
 |________________________|
                          |                         ________________________ E
                          |________________________|
                                                   |________________________ F

majority consensus for order: (2, 1, 3)
                           ________________________ A
                          |
  ________________________|________________________ B
 |                        |
_|                        |________________________ C
 |
 |                         ________________________ D
 |________________________|
                          |                         ________________________ E
                          |________________________|
                                                   |________________________ F

majority consensus for order: (2, 3, 1)
                           ________________________ A
                          |
  ________________________|________________________ B
 |                        |
_|                        |________________________ C
 |
 |                         ________________________ D
 |________________________|
                          |                         ________________________ E
                          |________________________|
                                                   |________________________ F

majority consensus for order: (3, 1, 2)
                           ________________________ A
                          |
  ________________________|________________________ B
 |                        |
_|                        |________________________ C
 |
 |                         ________________________ E
 |________________________|
                          |                         ________________________ D
                          |________________________|
                                                   |________________________ F

majority consensus for order: (3, 2, 1)
                           ________________________ A
                          |
  ________________________|________________________ B
 |                        |
_|                        |________________________ C
 |
 |                         ________________________ E
 |________________________|
                          |                         ________________________ D
                          |________________________|
                                                   |________________________ F

So indeed, the last two consensus trees are different from the first four consensus trees.

To understand why that happens, we look at the source code for majority_consensus(). There we see that the first step is to create the root clade. The order for the terminal clades is determined by the first tree that is provided.

So for (tree1, tree2, tree3), the first provided tree is tree1 and the root clade is ABCDEF. But for (tree3, tree2, tree1) the first tree provided is tree3 and the root clade becomes ABCEDF.

So, this is a limitation of the algorithm itself and any implementation of majority consensus will give you a different order dependent on the first provided tree.

Upvotes: 5

Make a consensus tree from several tree using Bio.phylo

Answers (1)

Related Questions