Reputation: 820
Presenting large graph (>10000 nodes; > 10000 edges) using igraph
package with Fruchterman-Reingold layout algorithm. Some outlier nodes will make the visualization difficult, 99% nodes huddled together, while 1% outlier nodes located far away. For example, 99.9% nodes locate between 0 and 10, but 0.1% nodes locate outside 10000. The problem is how to control these outlier nodes to present the all nodes.
Here is an example, in which the 0.2% outlier nodes make the full presentation difficult.
> library(igraph)
> set.seed(12)
> ig <- erdos.renyi.game(12000,1/10000,directed=TRUE,loops=FALSE)
> ig.layout <- layout_with_fr(ig)
> apply(ig.layout,2,quantile,c(0,0.001,0.01,0.1,0.9,0.99,0.999,1))
[,1] [,2]
0% -54.7584289 -58.192821
0.1% -49.8806632 -51.090376
1% -29.7822097 -33.073435
10% -0.2196407 -1.170996
90% 10.1564691 10.513665
99% 2026.5245335 737.739440
99.9% 16433.7302032 13168.400710
100% 22614.7986797 22284.309659
Upvotes: 0
Views: 284
Reputation: 37661
One way to "control" the outliers is to get rid of them. This will reduce your initial problem, but you will still be stuck with a big graph that is hard to visualize. But let's deal with one thing at a time. First, the outliers.
Unfortunately, you set the seed after you generated the graph. I will move the set.seed
statement first so that the results will be reproducible.
library(igraph)
set.seed(12)
ig <- erdos.renyi.game(12000,1/10000,directed=TRUE,loops=FALSE)
ig.layout <- layout_with_fr(ig)
apply(ig.layout,2,quantile,c(0,0.001,0.01,0.1,0.9,0.99,0.999,1))
[,1] [,2]
0% -5.359639e+01 -9.898871e+01
0.1% -4.996891e+01 -5.046219e+01
1% -3.040131e+01 -2.934615e+01
10% -1.221806e-02 1.513951e-02
90% 1.207328e+01 1.130579e+01
99% 1.111746e+03 6.994646e+02
99.9% 1.418739e+04 1.182382e+04
100% 1.968552e+04 2.025938e+04
I get a result comparable to yours. More to the point, the graph is badly warped by the outliers.
plot(ig, layout=ig.layout, vertex.size=4, vertex.label=NA,
edge.arrow.size=0.4)
But what are these outliers?
igComp = components(ig)
table(igComp$csize)
1 2 3 4 5 6 7 10489
1041 137 42 8 5 1 1 1
Your graph has one very large component and quite a few small components. The "outliers" are the nodes in the small, disconnected components. My suggestion is that if you want to see the graph, eliminate these small components. Just look at the big component.
C1 = induced_subgraph(ig, which(igComp$membership ==1))
set.seed(12)
C1.layout <- layout_with_fr(C1)
apply(C1.layout,2,quantile,c(0,0.001,0.01,0.1,0.9,0.99,0.999,1))
[,1] [,2]
0% -18.111038 -30.5068075
0.1% -11.257167 -14.4507491
1% -4.570292 -3.2830470
10% 0.124789 0.1836629
90% 7.182714 7.1506193
99% 12.291679 13.1523646
99.9% 26.812703 23.6325447
100% 35.186445 26.8564644
Now the layout is more reasonable.
plot(C1, layout=C1.layout, vertex.size=4, vertex.label=NA,
edge.arrow.size=0.4)
Now the "outliers" are gone and we see the core of the graph. You have a different problem now. It is hard to look at 10500 nodes and make sense of it, but at least you can see this core. I wish you luck with taking the exploration further.
Upvotes: 1