Plotly-igraph plotting a tree with variable number of children nodes?

Question

I wish to generate a graph that visualizes the structure of an xml file.

I created a list of nodes to represent the xml file.
Each node contains 3 strings: the xml tag, attribute and content.

The xml file looks like this:



   
   
      Mus musculus clone RP24-146B1, WORKING DRAFT SEQUENCE, 10 ordered pieces.
   
   AC116785
   
      AC116785.3
      21703640
   
   
      HTG
      HTGS_PHASE2
      HTGS_DRAFT
      HTGS_FULLTOP
   
   
      house mouse.
      
         Mus musculus
         
            Eukaryota
            Metazoa
            Chordata
            Craniata
            Vertebrata
            Euteleostomi
            Mammalia
            Eutheria
            Rodentia
            Sciurognathi
            Muridae
            Murinae
            Mus
         
      
   
   
      
         
            Birren,B.
         
         Mus musculus, clone RP24-146B1
         
            Unpublished
         
      
      
         
            Birren,B.
         
         Direct Submission
         
            02-APR-2002
            Whitehead Institute/MIT Center for Genome Research, 320 Charles Street, Cambridge, MA 02141, USA
         
      
      
         
            Birren,B.
         
         Direct Submission
         
            08-JUL-2002
            Whitehead Institute/MIT Center for Genome Research, 320 Charles Street, Cambridge, MA 02141, USA
         
      
   
   
      
         Jul 8, 2002
         21700645
      
      Smit, A.F.A. ,  Green, P. (1996-1997)http://ftp.genome.washington.edu/RM/RepeatMasker.html
      Whitehead Institute/ MIT Center for Genome Research
      WIBR
      http://www-seq.wi.mit.edu
      sequence_submissions@genome.wi.mit.edu
      L25104
      146_B_1
      Plasmid; n/a; 100% of reads
      Dye-terminator Big Dye; 100% of reads
      Phrap; version 0.960731
      130058 bases at least Q40
      131186 bases at least Q30
      131595 bases at least Q20
      142000; agarose-fp
      132012; sum-of-contigs
      6.9 in Q20 bases; agarose-fp
      7.5 in Q20 bases; sum-of-contigs
      This is a 'working draft' sequence. It currently consists of 10 contigs. Gaps between the contigsare represented as runs of N. The order of the piecesis believed to be correct as given, however the sizesof the gaps between them are based on estimates that haveprovided by the submittor.This sequence will be replacedby the finished sequence as soon as it is available andthe accession number will be preserved.
      contig of 1178 bp in length
      gap of      100 bp
      contig of 1557 bp in length
      gap of      100 bp
      contig of 2450 bp in length
      gap of      100 bp
      contig of 2707 bp in length
      gap of      100 bp
      contig of 2196 bp in length
      gap of      100 bp
      contig of 2213 bp in length
      gap of      100 bp
      contig of 5815 bp in length
      gap of      100 bp
      contig of 15977 bp in length
      gap of      100 bp
      contig of 16111 bp in length
      gap of      100 bp
      contig of 81808 bp in length.
   
   
      
         1..132912
         taxon:10090
         RP24-146B1
         RPCI-24 Male Mouse BAC
      
      
         1..1178
      
      
         1279..2835
      
      
         2936..5385
      
      
         5486..8192
      
      
         8293..10488
      
      
         10589..12801
      
      
         12902..18716
      
      
         18817..34793
      
      
         34894..51004
      
      
         51105..132912
      
   
   
   mhkkiciigagaaglvsakhaikqgyqvdifeqtdqvggtwvysektgchsslykvmktn
lpkeamlfqdepfrdelpsfmshehvleylnefskdfpiqfsstvnevkrendlwkvlie
snsetitrfydvvfvcnghffeplnpyqnsyfkgklihshdyrraehytgknvvivgagp
sgiditlqiaqtanhvtliskkatypvlpesvqqmatnvksvdehgvvtdegdhvpadvi
ivctgyvfkfpfldssliqlkyndrmvsplyehlchvdypttlffiglplgtitfplfev
qvkyalsliagkgklpsddveirnfedarlqgllnpasfhviieeqweymkklakmggfe
ewnymetikklygyimterkknvigykmvnfelttdssdfklltirvdfnddvawiirfa
ypi

I wish to generate a tree-plot graph using Plotly and igraph libraries by enumerating the list of nodes.

I am using this website here as a reference.

My XML file has elements with variable number of sub-elements. However, the example given only shows me how to develop a tree with a fixed number of children nodes (the example shows a fixed number of 2 children per node)

Looking at the igraph tutorial website here, I see a similar example, where they only use 2 children nodes per node.

How should I go about generating a tree with variable number of children nodes such as in my XML file?

I've been stuck on this for so long, any help would be greatly appreciated!

Arpad Horvath -- Слава Україні · Accepted Answer

You can create the graph like that:

from lxml import etree
from igraph import Graph
   
root = etree.parse("entry.xml").getroot()
 
element_ids = {elem: i for i, elem in enumerate(root.iter())}

edges = []
for parent, parent_id in element_ids.items():
    for child in parent.getchildren():
        edges.append((parent_id, element_ids[child]))

G = Graph(edges)

element_ids dictionary will contain all the tags in the XML as keys and different ids for all the elements like {tag1: 0, tag2: 1, tag3: 2}. That way you will find the ids for all the tags later.

I don't know how to put labels into plotly, but for plotting with igraph it can be useful to add the tag names as labels:

names = [e.tag for e in element_ids]
G.vs['label'] = names

I have not tried but having the graph plotly visualization must be the same as in the article.

Plotly-igraph plotting a tree with variable number of children nodes?

Answers (1)

Related Questions