spiral01
spiral01

Reputation: 545

Condensing phyloxml phylogenetic trees

I am trying to condense (simplifying clades in which all leaves have the same label) phylogenetic trees that are in the phyloxml format. A program called Newick Utils is very useful to do this with newick format trees, condensing this tree:

Original Tree

into this one:

Condensed tree

As I am eventually trying to split my gene tree into all it's subtrees at each duplication node, this is a useful way of reducing the number of subtrees without losing information.

Does anyone know a way of doing this with phyloxml trees? Newick Utils only accepts Newick format, so I need a way to parse the phyloxml format using Biopython. Thanks.

Upvotes: 1

Views: 361

Answers (1)

xbello
xbello

Reputation: 7443

As a quick answer, you can transform phyloxml into newick very easily:

from Bio import Phylo

Phylo.convert("original.xml", "phyloxml", "converted.newick", "newick")

Now you can call your Newick Utils to condense the tree.


If you want to delete the leafs when they have the same name:

for clade in tree.find_clades():
    if clade.count_terminals() > 1:
        leafs = clade.get_terminals()
        if len(set([t.name for t in leafs])) == 1:
            # All the leafs in this clade have the same name.
            #  Cut them all except the first one.
            for leaf in leafs[1:]:                               
                tree.prune(leaf)

Ideally you will put the above code in a function that returns the new pruned tree, and call the function every time a leaf is pruned.

Upvotes: 2

Related Questions