Reputation: 99

Subtrees of Phylogenetic Trees in BioPython

I have a phylogenetic tree in newick format. I want to pull out a subtree based on the labels of the terminal nodes (so based on a list of species). A copy of the tree I am using can be found here: http://hgdownload.soe.ucsc.edu/goldenPath/dm6/multiz27way/dm6.27way.nh

Currently I have read in the tree using BioPython like so:

from Bio import Phylo
#read in Phylogenetic Tree
tree = Phylo.read('dm6.27way.nh', 'newick')
#list of species of interest
species_list = ['dm6', 'droSim1', 'droSec1', 'droYak3', 'droEre2', 'droBia2', 'droSuz1', 'droAna3', 'droBip2', 'droEug2', 'droEle2', 'droKik2', 'droTak2', 'droRho2', 'droFic2']

How would I pull out the subtree of only the species in species_list?

Upvotes: 2

Answers (3)

Roger V.

Reputation: 668

As @mitoRibo have pointed out in their comment to the OP, the question can be understood differently:

extracting the subtree rooted at the most recent common ancestor of the clades in the species_list (which may include other clades)

extracting the subtree that has as its clades only the species in the species_list

In the example given in the OP these two are indistinguishable, as species_list contains all the clades descendant from their common ancestor, so the answer by @mitoRibi does the trick. However, if we were interested in the second interpretation above, it wouldn't suffice. Indeed, if we limit the list to only 2 species:

from Bio import Phylo
#read in Phylogenetic Tree
tree = Phylo.read('dm6.27way.nh', 'newick')
species_list = ['dm6', 'droFic2']
ca = tree.common_ancestor(species_list)

the result will be still the tree with 15 clades (the same as in the original list in the OP.)

I therefore post the solution that extracts the subtree containing only the clades in the species tree:

from Bio import Phylo
from copy import deepcopy

#read in Phylogenetic Tree
tree = Phylo.read('dm6.27way.nh', 'newick')
#list of species of interest
species_list = ['dm6', 'droFic2']

subtree = deepcopy(tree)
for t in tree.get_terminals():
    if t.name not in species_list:
        subtree.prune(t.name)

with the result:

  ______________________________________________________________________ dm6
_|
 |___________________________________________ droFic2

This is admittedly a pedestrian solution, as looping over all clades might be a bit slow for big trees. However, to my knowledge, it is the only way it can be done with BioPython. This is why the solution with ete3, suggested by @IanFiddes, is well worth looking at.

Upvotes: 1

Ian Fiddes

Reputation: 3011

Instead of using BioPython, use ete3.

from ete3 import Tree
t = Tree('dm6.27way.nh')
t.prune(species_list, preserve_branch_length=True)
t.write()

From the documentation,

From version 2.2, this function includes also the preserve_branch_length flag, which allows to remove nodes from a tree while keeping original distances among remaining nodes.

Upvotes: 3

mitoRibo

Reputation: 4548

Ok yeah, assuming you want the smallest tree that has all the species in your species list you want the root node of this tree to be the most recent common ancestor (MRCA) of all the species in the list which is thankfully already implemented in Phylo:

from Bio import Phylo

#read in Phylogenetic Tree
tree = Phylo.read('dm6.27way.nh', 'newick')
#list of species of interest
species_list = ['dm6',
                'droSim1',
                'droSec1',
                'droYak3',
                'droEre2',
                'droBia2',
                'droSuz1',
                'droAna3',
                'droBip2',
                'droEug2',
                'droEle2',
                'droKik2',
                'droTak2',
                'droRho2',
                'droFic2']

common_ancestor = tree.common_ancestor(species_list)
Phylo.draw_ascii(common_ancestor)

output:

Clade
                                           ___ dm6
                                       ___|
                                      |   | , droSim1
                                      |   |_|
                            __________|     | droSec1
                           |          |
                           |          |  _____ droYak3
                          ,|          |_|
                          ||            |____ droEre2
                          ||
                          ||      _______ droBia2
                          ||_____|
                          |      |_____ droSuz1
                          |
                        __|                                  _______ droAna3
                       |  |_________________________________|
                       |  |                                 |________ droBip2
                       |  |
                       |  |___________________ droEug2
                       |
                       |_____________ droEle2
                      ,|
                      ||______________________________ droKik2
                    __||
                   |  ||______________ droTak2
___________________|  |
                   |  |____________ droRho2
                   |
                   |_______________ droFic2

Upvotes: 5

Subtrees of Phylogenetic Trees in BioPython

Answers (3)

Related Questions