Jacky
Jacky

Reputation: 285

Visualize and clustering

Earlier on i post a question about visualization and clustering. I guess my question was not quite clear enough so I post it again. I hope i make a better explanation this time . I also apologize for not "accept answer" for my old questions. I didn't know i can do that until a guy point it out. I will definitely do it from now on.

Okay. Back to the question. Previously i have written a python script to calculate the similarity between document. Now i have all the data write to notepad and it looks like this:

(1, 6821): inf

(1, 8): 3.458911570

(1, 9): 7.448105193

(1, 10): inf

(1, 11): inf

(6821, 8): inf

(6821, 9): inf

(6821, 10): inf

(6821, 11): inf

(8, 9): 2.153308936

(8, 10): inf

(8, 11): 16.227647992

(9, 10): inf

(9, 11): 34.943139430

(10, 11): inf

The number in the parenthesis represents document numbers. And the value after it, is the distance between the two documents. What i want is actually visualization tools or method which i can create nodes that represent each documents number. For example here, i have 6 different documents. So i wish to create 6 different nodes that represent my document numbers. Then, i want to have edges that connect these nodes together based on their distances. For example the distance between document 1 and 8 is 3.46 while the distance between document 1 and 9 is 7.45. So, 1 & 8 need to cluster closer than 1 & 9. While the document pairs with 'inf' distance shouldn't have any connection or edge connecting them together.

This sounds easy but i have really hard time finding an open source visualization tool that can effective help me to perform this. I appreciate any suggestion recommendation.

Upvotes: 1

Views: 1072

Answers (3)

msw
msw

Reputation: 43487

http://www.graphviz.org/

In particular, the neato package:

$ cat similar.dot
graph g {
   n1 -- n8 [ weight = 3.458911570 ];
   n1 -- n9 [ weight = 7.448105193 ];
   n8 -- n9 [ weight = 2.153308936 ];
   n8 -- n11 [ weight = 16.227647992 ];
   n9 -- n11 [ weight = 34.943139430 ];
   n10;
   n6821;
}
$ neato -Tpng similar.dot -o similar.png

Upvotes: 1

Martin Spa
Martin Spa

Reputation: 1534

Processing is a really lovely tool for data visualization (and also language, based on Java). Think of it as writing simplified OpenGL (you can even use OpenGL with it if you want it) in Java plus the freedom to use all the Java libraries. You can even embed your Processing app inside another Swing or AWT application.

Here's the main page, and the brand new wiki.

You said you used Pyton. There's a hack so you can use Jython instead of Java in this blog post. I haven't tried it but maybe it works fine. The only lack in using another languageh (there's also a JavaScript 'port', Processing.js) is that all the examples are for the Processing language (based on Java).

Upvotes: 0

AlG
AlG

Reputation: 15157

Have you tried GraphViz? I use it for situations like this. I haven't tried altering the length of the node connections, you'll have to tease that one out. Check out the list of example graphs as a starting point.

Upvotes: 2

Related Questions