frazman
frazman

Reputation: 33303

Not able to get my head around this python

I just implemented a hierarchical clustering by following the documentation here: http://www.mathworks.com/help/stats/hierarchical-clustering.html?s_tid=doc_12b

So, let me try to put down what I am trying to do. Take a look at the following figure:

dendogram

Now, this dendogram is generated from the following data:

                         node1        node2         dist(node1,node2)   num_elems
assigning index  **37  to  [ 16.          26**.           1.14749118   2.        ]
assigning index  38  to  [ 4.          7.          1.20402602  2.        ]
assigning index  39  to  [ 13.          29.           1.44708015   2.        ]
assigning index  40  to  [ 12.          18.           1.45827365   2.        ]
assigning index  41  to  [ 10.          34.           1.49607538   2.        ]
assigning index  42  to  [ 17.          38.           1.52565922   3.        ]
assigning index  43  to  [  8.          25.           1.58919037   2.        ]
assigning index  44  to  [  3.          40.           1.60231007   3.        ]
assigning index  45  to  [  6.          42.           1.65755731   4.        ]
assigning index  46  to  [ 15.          23.           1.77770844   2.        ]
assigning index  47  to  [ 24.          33.           1.77771082   2.        ]
assigning index  48  to  [ 20.          35.           1.81301111   2.        ]
assigning index  49  to  [ 19.         48.          1.9191061   3.       ]
assigning index  50  to  [  0.          44.           1.94238609   4.        ]
assigning index  51  to  [  2.         36.          2.0444266   2.       ]
assigning index  52  to  [ 39.          45.           2.11667375   6.        ]
assigning index  53  to  [ 32.          43.           2.17132916   3.        ]
assigning index  54  to  [ 21.         41.          2.2882061   3.       ]
assigning index  55  to  [  9.          30.           2.34492327   2.        ]
assigning index  56  to  [  5.          51.           2.38383321   3.        ]
assigning index  57  to  [ 46.          52.           2.42100025   8.        ]
assigning index  58  to  [ **28.          37**.           2.48365024   3.        ]
assigning index  59  to  [ 50.          53.           2.57305009   7.        ]
assigning index  60  to  [ 49.          57.           2.69459675  11.        ]
assigning index  61  to  [ 11.          54.           2.75669475   4.        ]
assigning index  62  to  [ 22.          27.           2.77163751   2.        ]
assigning index  63  to  [ 47.          55.           2.79303418   4.        ]
assigning index  64  to  [ 14.          60.           2.88015327  12.        ]
assigning index  65  to  [ 56.          59.           2.95413905  10.        ]
assigning index  66  to  [ 61.          65.           3.12615829  14.        ]
assigning index  67  to  [ 64.          66.           3.28846304  26.        ]
assigning index  68  to  [ 31.         58.          3.3282066   4.       ]
assigning index  69  to  [ 63.          67.           3.47397104  30.        ]
assigning index  70  to  [ 62.          68.           3.63807605   6.        ]
assigning index  71  to  [  1.          69.           4.09465969  31.        ]
assigning index  72  to  [ 70.          71.           4.74129435  37.     

So basically, there are 37 points in my data same indexed from 0-36..Now, when I see the first element in this list... I assign i + len(thiscompletelist) + 1 So for example, when the id is 37 seen again in future iterations, then that basically means that it is linked to a branch as well. I used matlab to generate this image. But I want to query this information as query_node(node_id) such that it returns me a list by level.. such that... on query_node(37) I get

{ "left": {"level":1 {"id": 28}} , "right":{"level":0 {"left" :"id":16},"right":{"id":26}}}

Actually.. I dont even know what is the right data structure to do this.. Basically I want to query by node and gain some insight on what does the structure of this dendogram looks like when I am standing on that node and looking below. :(

EDIT 1:

*OOH I didn't knew that you wont be able to zoom the image.. basically the fourth element from the left is 28 and the green entry is the first row of the data..

So fourth vertical line on dendogram represents 28

Next to that line (the first green line) represents 16

and next to that line (the second green line) represents 26*

Upvotes: 0

Views: 102

Answers (1)

Bula
Bula

Reputation: 1586

Well it's always good to build upon something already existing so take a look at dendrogram in scipy.

Upvotes: 2

Related Questions