FilipeTeixeira
FilipeTeixeira

Reputation: 1160

Merge edges by attribute/group in iGraph (R)

Given the graph below, I want to combine some of the edges by $name. However, it is easy to simplify a graph by merging based on the vertex they are connected to, but not by grouping them by a certain label, in this case $name.

g <- graph(c(1,2, 1,2, 1,2, 2,3, 3,4))
E(g)$weight <- 1:5
E(g)$name <- c("A", "A", "B", "C", "D")

When running the simplify function or even as.directed\as.undirected, the names are dropped if not specified on the edge.attr.comb argument, which makes perfect sense. But as for specifying, I can only chose max or min, or to convert it into a string.

simplify(g, edge.attr.comb=list(weight = "sum")

What I would like to end up is with a graph where the edges labelled with A are merged/summed, but the ones labelled with B are kept as a parallel edge. I've tried several things unsuccessfully.

EDIT: I am aware that I can convert the graph to a data frame, group the data there, and back to graph. Or to simply to prepare the data frame prior to put it into graph. However, that's too much fiddling around and it would be easier to do it directly through igraph.

Upvotes: 2

Views: 1264

Answers (1)

gfgm
gfgm

Reputation: 3647

You can do this by converting to a dataframe and then back to a graph:

library(dplyr)
df <- igraph::as_data_frame(g)
df <- df %>% group_by(name) %>% mutate(weight = sum(weight)) %>% unique()
df
# A tibble: 4 x 4
# Groups:   name [4]
from    to weight name 
<dbl> <dbl>  <int> <chr>
1  1.00  2.00      3 A    
2  1.00  2.00      3 B    
3  2.00  3.00      4 C    
4  3.00  4.00      5 D    

g2 <- igraph::graph_from_data_frame(df)

Edit

Sorry just got back to this. Yeah I don't think the exact function exists and it would be nice to have. But you can do it in two steps by 1.) aggregating the weights for nodes with shared names and 2.) dropping duplicated edges

library(dplyr)
library(microbenchmark)
library(igraph)
g <- graph(c(1,2, 1,2, 1,2, 2,3, 3,4))
E(g)$weight <- 1:5
E(g)$name <- c("A", "A", "B", "C", "D")

First wrap the to data.frame and back approach up into a function:

to_df_and_back <- function(g) {
  df <- igraph::as_data_frame(g)
  df <- df %>% group_by(name) %>% mutate(weight = sum(weight)) %>% unique()
  g2 <- igraph::graph_from_data_frame(df)
  g2
}

Now we make a function for the other approach: first recompute the edge weights adding up duplicates, then subset the graph to only uniquely named edge ids:

add_then_subset <- function(g) {
  E(g)$weight <- ave(E(g)$weight, names(E(g)), FUN=sum)
  g2 <- subgraph.edges(g, eid = E(g)[unique(E(g)$name)])
  g2
}

g1 <- to_df_and_back(g)
g2 <- add_then_subset(g)

identical(E(g1)$weight, E(g2)$weight)
#> [1] TRUE

The speed results here imply the reweight and subset strategy is a good deal faster (median is about a quarter the time) but you'll want to test this on your data as I don't know how it will scale.

microbenchmark(to_df_and_back(g), add_then_subset(g))
#> Unit: milliseconds
#>                expr      min       lq     mean   median       uq       max
#>   to_df_and_back(g) 4.588584 4.851213 6.901448 4.947683 5.130546 182.82945
#>  add_then_subset(g) 1.208795 1.314137 2.138570 1.382700 1.485809  70.16585
#>  neval cld
#>    100   b
#>    100  a

Upvotes: 2

Related Questions