Reputation: 545
I am trying to condense (simplifying clades in which all leaves have the same label) phylogenetic trees that are in the phyloxml format. A program called Newick Utils is very useful to do this with newick format trees, condensing this tree:
into this one:
As I am eventually trying to split my gene tree into all it's subtrees at each duplication node, this is a useful way of reducing the number of subtrees without losing information.
Does anyone know a way of doing this with phyloxml trees? Newick Utils only accepts Newick format, so I need a way to parse the phyloxml format using Biopython. Thanks.
Upvotes: 1
Views: 361
Reputation: 7443
As a quick answer, you can transform phyloxml into newick very easily:
from Bio import Phylo
Phylo.convert("original.xml", "phyloxml", "converted.newick", "newick")
Now you can call your Newick Utils to condense the tree.
If you want to delete the leafs when they have the same name:
for clade in tree.find_clades():
if clade.count_terminals() > 1:
leafs = clade.get_terminals()
if len(set([t.name for t in leafs])) == 1:
# All the leafs in this clade have the same name.
# Cut them all except the first one.
for leaf in leafs[1:]:
tree.prune(leaf)
Ideally you will put the above code in a function that returns the new pruned tree, and call the function every time a leaf is pruned.
Upvotes: 2