John L. Godlee
John L. Godlee

Reputation: 601

R ggnetwork facet by origin node - display destination nodes in each panel

library(dplyr)
library(ggnetwork)
library(ggplot)
library(igraph)
library(sna)

I have a data frame which looks like this, representing connections in a network between a number of objects:

origin <- c("A", "A", "B", "B", "C", "C", "B", "B")

dest <- c("D", "C", "D", "C", "B", "E", "E", "F")

net <- data.frame(origin, dest)

Then I summarise the data frame for use in ggnetwork like this, to show every combination of origin and destination as its own row:

df_edges <- net %>% group_by(origin, dest) %>% summarize(weight = n())

Then I convert to an igraph object, then a ggnetwork object like this:

net_igraph <- graph.data.frame(df_edges, directed = T)

df_net <- ggnetwork(net_igraph)

Finally, I want to plot in ggplot2. If I want to plot all connections together I can plot like this:

ggplot(df_net, aes(x = x, y = y, xend = xend, yend = yend, label = vertex.names)) + 
    geom_edges() +
    geom_nodetext() +
    geom_nodes()

But I want to plot as a facet_wrap, so that each origin is given its own panel, showing the connections to each connected destination. The problem is that when I plot like this, the destination nodes are not displayed:

ggplot(df_net, aes(x = x, y = y, xend = xend, yend = yend, label = vertex.names)) + 
    geom_edges() +
    geom_nodetext() +
    geom_nodes() + 
    facet_wrap(~ vertex.names)

How can I get the destination nodes to be displayed in each panel?

I looked on the help files for ggnetwork() and found to use the by = argument, but not sure what my chosen "edge attribute" would be.

Upvotes: 1

Views: 342

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48241

I couldn't find any direct way to achieve that, which is not surprising given that

head(df_net, 2)
#           x         y  na.x vertex.names      xend      yend na.y weight
# 1 1.0000000 0.1356215 FALSE            A 1.0000000 0.1356215   NA     NA
# 2 0.3039919 0.5152220 FALSE            B 0.3039919 0.5152220   NA     NA

That is, in every row there is only the origin vertex name. So, while adding the destination vertices is actually easy, adding their names requires some extra work.

The structure of df_net is such that first we have several (just as many as vertices) rows with weight being NA, those rows just define vertex positions (notice also that x coincides with xend and y with yend). Then we have just as many rows as edges corresponding to the edges, where to draw them.

However, there is an issue. For instance,

df_net[c(3, 7), ]
#            x        y  na.x vertex.names      xend       yend  na.y weight
# 3  0.4846586 0.000000 FALSE            C 0.4846586 0.00000000    NA     NA
# 31 0.3039919 0.515222 FALSE            B 0.4763860 0.02359162 FALSE      1

The second row corresponds to an edge from B to C. The problem is that xend and yend of the second row are not exactly as x and y of the first row. Hence, we cannot directly identify that this edge actually goes to C. For this purpose, we can use an approximate matching function defined as follows:

apprMatch <- function(x, y) apply(x, 1, function(z) which.min(colSums((t(y) - z)^2)))

It takes two matrices (two columns each) and for each row of x it finds the closest row of y. Given that the graph is not extremely dense, it should work without any problems (even when it is dense I would expect it to work).

Hence, let

ends1 <- with(df_net, cbind(xend, yend)[!is.na(weight), ])
ends2 <- with(df_net, cbind(x, y)[is.na(weight), ])

be those two matrices that we want to match. Then

df_net$to[!is.na(df_net$weight)] <- as.character(df_net$vertex.names[apprMatch(ends1,ends2)])

yields

tail(df_net, 2)
#    x         y  na.x vertex.names      xend        yend  na.y weight to
# 10 1 0.1356215 FALSE            A 0.5088354 0.006362567 FALSE      1  C
# 11 1 0.1356215 FALSE            A 0.8644390 0.614776499 FALSE      1  D

i.e., a destination vertex names column to. Thus, all in all we have

apprMatch <- function(x, y) apply(x, 1, function(z) which.min(colSums((t(y) - z)^2)))
ends1 <- with(df_net, cbind(xend, yend)[!is.na(weight), ])
ends2 <- with(df_net, cbind(x, y)[is.na(weight), ])
df_net$to[!is.na(df_net$weight)] <- as.character(df_net$vertex.names[apprMatch(ends1,ends2)])

ggplot(df_net, aes(x = x, y = y, xend = xend, yend = yend, label = vertex.names)) + 
  geom_edges() +
  geom_nodetext(vjust = 1, hjust = 1) + 
  geom_nodetext(aes(label = to, x = xend, y = yend), vjust = 1, hjust = 1) +
  geom_nodes() +
  geom_nodes(aes(x = xend, y = yend)) +
  facet_wrap(~ vertex.names)

enter image description here

where I also added vjust and hjust so that the vertex names are clearer.

Upvotes: 3

Related Questions