Reputation: 1160
Given the graph below, I want to combine some of the edges by $name. However, it is easy to simplify a graph by merging based on the vertex they are connected to, but not by grouping them by a certain label, in this case $name.
g <- graph(c(1,2, 1,2, 1,2, 2,3, 3,4))
E(g)$weight <- 1:5
E(g)$name <- c("A", "A", "B", "C", "D")
When running the simplify
function or even as.directed\as.undirected
, the names are dropped if not specified on the edge.attr.comb
argument, which makes perfect sense. But as for specifying, I can only chose max or min, or to convert it into a string.
simplify(g, edge.attr.comb=list(weight = "sum")
What I would like to end up is with a graph where the edges labelled with A are merged/summed, but the ones labelled with B are kept as a parallel edge. I've tried several things unsuccessfully.
EDIT: I am aware that I can convert the graph to a data frame, group the data there, and back to graph. Or to simply to prepare the data frame prior to put it into graph. However, that's too much fiddling around and it would be easier to do it directly through igraph.
Upvotes: 2
Views: 1264
Reputation: 3647
You can do this by converting to a dataframe and then back to a graph:
library(dplyr)
df <- igraph::as_data_frame(g)
df <- df %>% group_by(name) %>% mutate(weight = sum(weight)) %>% unique()
df
# A tibble: 4 x 4
# Groups: name [4]
from to weight name
<dbl> <dbl> <int> <chr>
1 1.00 2.00 3 A
2 1.00 2.00 3 B
3 2.00 3.00 4 C
4 3.00 4.00 5 D
g2 <- igraph::graph_from_data_frame(df)
Sorry just got back to this. Yeah I don't think the exact function exists and it would be nice to have. But you can do it in two steps by 1.) aggregating the weights for nodes with shared names and 2.) dropping duplicated edges
library(dplyr)
library(microbenchmark)
library(igraph)
g <- graph(c(1,2, 1,2, 1,2, 2,3, 3,4))
E(g)$weight <- 1:5
E(g)$name <- c("A", "A", "B", "C", "D")
First wrap the to data.frame and back approach up into a function:
to_df_and_back <- function(g) {
df <- igraph::as_data_frame(g)
df <- df %>% group_by(name) %>% mutate(weight = sum(weight)) %>% unique()
g2 <- igraph::graph_from_data_frame(df)
g2
}
Now we make a function for the other approach: first recompute the edge weights adding up duplicates, then subset the graph to only uniquely named edge ids:
add_then_subset <- function(g) {
E(g)$weight <- ave(E(g)$weight, names(E(g)), FUN=sum)
g2 <- subgraph.edges(g, eid = E(g)[unique(E(g)$name)])
g2
}
g1 <- to_df_and_back(g)
g2 <- add_then_subset(g)
identical(E(g1)$weight, E(g2)$weight)
#> [1] TRUE
The speed results here imply the reweight and subset strategy is a good deal faster (median is about a quarter the time) but you'll want to test this on your data as I don't know how it will scale.
microbenchmark(to_df_and_back(g), add_then_subset(g))
#> Unit: milliseconds
#> expr min lq mean median uq max
#> to_df_and_back(g) 4.588584 4.851213 6.901448 4.947683 5.130546 182.82945
#> add_then_subset(g) 1.208795 1.314137 2.138570 1.382700 1.485809 70.16585
#> neval cld
#> 100 b
#> 100 a
Upvotes: 2